by Arthur Jatteau
About the author
Arthur Jatteau obtained a PhD in sociology and economics in 2016 for his thesis entitled “Evidence by numbers? The case of randomized controlled trials in economics”, for which he won the French Court of Auditors’ thesis prize.
Randomised controlled trials (RCTs) have gained remarkable traction in economics since the 2000s, thanks in no small part to Esther Duflo, Director of the Abdul Latif Jameel Poverty Action Lab (J-PAL). Proponents hold up the method as a particularly robust impact evaluation tool. Although little is known about its origins, examining the backgrounds of those who use it – especially their academic and career paths – goes some way to explaining why it has become such a success. Digging deeper into what the theory involves, and how it came to be applied in practice, sheds light on the inevitable compromises that shaped its rise to prominence and on how an obsession with quantification can affect policy-making.
Esther Duflo stands alongside Thomas Piketty and Jean Tirole as one of the world’s most prominent French economists. She has won a host of prizes and awards, including the Le Monde and Le Cercle des Économistes Best Young French Economist Prize (2005) and the American Economic Association’s John Bates Clark Medal (2010). She is a professor at the Massachusetts Institute of Technology (MIT), where she is Co-founder and Director of the Abdul Latif Jameel Poverty Action Lab (J-PAL). Duflo is a keen proponent of randomised controlled trials (RCTs) – a method in which participants are randomly assigned to groups to assess and measure the impact of interventions as disparate as drugs and microcredit. Commonly used in medicine, RCTs have become popular in development and, increasingly, in public policy evaluation – with great success. The J-PAL lab has run more than 600 RCTs to date, on top of many more conducted by the World Bank. But how do RCTs actually work? What do they involve and what makes them different from other methods? And what are their benefits and drawbacks?, J-PAL).
Evaluating a policy or programme is far from straightforward. To understand why it can be so complex, let’s look at an initiative that involves giving free laptop computers to high-school pupils. An authority might decide to do this for any number of reasons – ensuring equal access to information and communication technologies, cultivating new teaching and learning practices, or simply improving pupils’ attainment. To keep things simple, let’s imagine that the aim is merely to boost pupils’ attainment. The project sponsor quite rightly wants to know what impact the intervention has had. In other words, did it “work”? The purpose of evaluation is answer this very question – to ascertain whether “something” (be it a public policy, an aid programme or a specific drug) actually makes a positive difference ?
Contents of the article
Evaluating a policy or programme is far from straightforward. To understand why it can be so complex, let’s look at an initiative that involves giving free laptop computers to high-school pupils. An authority might decide to do this for any number of reasons – ensuring equal access to information and communication technologies, cultivating new teaching and learning practices, or simply improving pupils’ attainment. To keep things simple, let’s imagine that the aim is merely to boost pupils’ attainment. The project sponsor quite rightly wants to know what impact the intervention has had. In other words, did it “work”? The purpose of evaluation is answer this very question – to ascertain whether “something” (be it a public policy, an aid programme or a specific drug) actually makes a positive difference.
How does evaluation work?
The challenge, when answering a question like this, is to determine what “would have happened” in the absence of the intervention. This is known as “counterfactual impact evaluation”. For argument’s sake, let’s imagine I take an aspirin because I have a headache. One hour later, the pain has gone away. So can I conclude that the drug worked? Absolutely not. To draw that conclusion, I would need to know whether the pain would have gone away if I hadn’t taken the pill. After all, headaches could well disappear naturally on their own. In this individual case, therefore, I cannot establish causality. The same holds true for individual high-school pupils. Imagine a pupil’s attainment improves a few months after receiving a free laptop. Here too, it is impossible to establish a cause-and-effect relationship because the same might have happened without the laptop.
One way around this problem is to think in terms of groups rather than individuals. Let’s take two groups, A and B. Members of both groups are sufficiently similar for comparison purposes (e.g. average age, gender distribution, attainment) and we can assume that both would have similar outcomes if they did not receive free laptops. Therefore, if group A are given free laptops and group B are not, we might expect group B’s attainment at the end of the academic year to match what group A would have attained had they not received the laptops.
Of course, creating comparable groups poses a real challenge. There are several ways to do this, but random assignment is the most compelling method. Provided the sample size is large enough (according to statistical rules of thumb), the evidence shows that randomly assigning pupils makes it more likely that the groups will be similar according to both measurable variables (such as attainment) and non-measurable variables (such as talent).
This is precisely how RCTs work – dividing a target population into two groups. In our example, the experimental group receives free laptops and the control group does not, then the groups are later assessed against certain indicators to measure the differences between them and to determine what impact the intervention had.
Esther Duflo and her colleagues are often credited with developing RCTs. In reality, however, the method is much older and its use – in economics and other disciplines – dates back a relatively long way. Evaluation plays an important role in many fields, several of which – not least psychology, medicine, education and agronomics – have been instrumental in developing RCTs as they exist today.
The method has its roots in the gradual adoption of the group trial model, and in a growing acceptance of the benefits of random assignment in evaluation. The evidence suggests that random assignment can be traced back to the late 19th century, when it was first used in psychophysics (a branch of psychology that measures sensations).
In any event, its origins cannot be pinned down to one specific factor and, therefore, one specific discipline. It seems, however, that medicine played a key role in stabilising and spreading the RCT method in the aftermath of the Second World War, with the advent of randomised clinical trials. The 1960s saw widespread adoption of RCTs as a public policy evaluation tool in the United States. So while Esther Duflo and her colleagues cannot be credited with pioneering the method, they were the first researchers to apply RCTs to international development – especially on such a scale.
The J-PAL lab has run hundreds of RCTs since its inception in 2003, spanning policy areas including education, health, microfinance, employment, governance, agriculture and environment. Here, we provide a brief overview of some of these trials.
Absenteeism among teachers and pupils is a major problem for schools in developing countries. Although enrolment rates are on the rise, many pupils still fail to attend school regularly. A number of RCTs have been conducted in an effort to combat truancy. In Kenya, where school is free to attend but pupils have to buy compulsory uniform, one trial looked at whether free uniform would improve attendance rates. Pupils were assigned randomly to two groups – one receiving uniform free of charge and the other having to pay for it. The researchers found a 43% reduction in truancy among the experimental group (i.e. the group that had been given free uniform). Other trials have been run in Kenyan schools, including free worming tablets and free school lunches – again, with compelling results. A separate trial in India looked at whether financial incentives could cut absenteeism among teachers. Although the method was ethically dubious, it proved effective at boosting attendance rates. Teachers were given digital cameras with a tamper-proof time- and date-stamp function and asked to take photos of themselves with their class at the start and end of every day to prove they had attended.
Even when pupils attend school, they often end up learning very little. Class sizes are one possible cause of this problem. A trial in Kenya looked the effect of halving class sizes (from 80 to 40 pupils). The researchers found no significant improvement in attainment among the experimental group when compared with the control group, although it is hard to imagine that smaller class sizes have no positive impact in general. Researchers also tested another intervention: free textbooks. Yet again, the results were disappointing. Attainment is a problem in developed countries too. An RCT in the Netherlands explored the impact of attainment-based bonuses but found they had so significant impact. In a separate trial in France, Créteil local education authority instructed participating schools to hold three information meetings for parents of sixth-grade pupils to explain how to support their children’s learning. The initiative produced positive results.
Aside from education, RCTs are also common practice in health policy evaluation. Unsafe sexual practices are a major public health concern, especially in developing countries. In a successful trial in Tanzania, participants were offered financial rewards if they managed not to contract sexually transmitted infections. A similar trial was run in Kenya, although this time the results were much less persuasive. This discrepancy raises the thorny question of whether trial results can be generalised. We will return to that question later on.
One of the nagging debates in health economics is whether health products should be fully subsidised (which widens access but risks money being spent unnecessary on people who can afford to pay for them), or whether people should be made to pay all or part of the cost (which targets resources where they are needed most but risks excluding the poorest). An RCT in Kenya aimed to answer this question. The trial population was split randomly into two groups. The first group were given free mosquito nets, and the second were given nets at a discounted price. The researchers observed better outcomes in the first group than in the second, suggesting that it is better to fully subsidise the cost of these products– at least in this particular context. Another preventive healthcare trial, this time in India, measured the impact of child immunisation incentives. Although the incentive was small in value (a bag of lentils), it produced significant results.
Labour economists have been grappling with the issue of unemployment for the past four decades. RCTs have been used to test some of the many recommended policy measures. In Denmark, for instance, researchers explored whether scheduling more regular appointments at job centres would make a difference to jobseekers. The trial produced extremely positive outcomes, although a similar one in the Netherlands proved much less convincing. Once again, these findings raise questions about whether RCTs have much value beyond the narrow circumstances in which they are conducted.
A trial in France sought to compare outcomes for jobseekers on public placement schemes with those on private-sector schemes, tackling a long-standing debate in public economics around the relative merits of the public and private sectors. Jobseekers were randomly assigned to two groups – one public, and one private. Better outcomes were observed in the first group than the second. In a separate trial in France, researchers took the novel step of exploring whether subsidising driving lessons would make a difference to the employment prospects of disadvantaged young people. As expected, those who received the subsidised lessons were more likely to pass their driving test, but the trial found no short-term employability benefits.
RCTs have a number of drawbacks. We explore just some of them below, without touching on ethical issues (which can be particularly concerning when the trials involve individuals) or on technical considerations (which are best left to econometrists). Instead, we focus on those drawbacks that have more to do with public policy implementation.
Although RCTs might seem alluring in theory, applying the method in practice poses the same challenges that come with any experimental process. Practitioners are sometimes reluctant to assign trial participants randomly or create a control group because those individuals placed in the “wrong” group would be deprived of the services that the trial is testing. Trial participants can also skew the results, for instance by changing the way they behave once they have been randomly assigned to a group (so-called “randomisation bias”). Consequently, any impact observed by the researchers cannot be ascribed solely to the intervention itself, since the very fact that the participants are involved in a trial influences the results. More generally, the Hawthorne effect describes how trial participants modify their behaviour when they are aware that they are being observed. In our previous example, there is every possibility that the pupils might work harder simply because they know they are taking part in a trial and because they appreciate the added attention that this brings, independently of the intervention that the trial is trying to evaluate.
Another drawback, this time affecting the applicability of RCT results, stems from the fact that the researchers who conduct such trials work to much longer time scales than the policy-makers who order them or use their results. Politicians, who are constrained by electoral cycles, often cannot wait for the results of RCTs to be published, while researchers are often unable to publish their findings quickly enough to hold politicians’ interest. The rift between these two worlds goes some way to explaining why RCTs have lost momentum in the United States since the 1980s.
The question we must therefore ask is this: what can we actually learn from RCTs? In other words: can the results of an RCT be generalised? There is no guarantee that the findings of a single trial, run by one team, in one place, and at one point in time, can help us to draw universal conclusions. In the literature, this issue is known as “external validity”: to what extent are the results of a trial “valid” and do they still hold true outside the context of that particular trial?
Regrettably, the literature appears to sideline what remains a fundamental question. It goes without saying that the results of a single RCT cannot be generalised. Nor can the results of two similar trials in two different sets of circumstances, since there is no guarantee that a third will produce the same outcomes. We cannot know whether the results of a trial can be generalised without first examining the context in which that trial took place. We must look to the social sciences for the tools to help us interpret trial outcomes. It is something of a paradox that, while RCTs are in vivo (as opposed to in vitro) experiments, the conclusions drawn from them still seem to give very little weight to actual conditions on the ground. If we are to overcome this problem, we must break free of the economic blinkers that blind the work of Esther Duflo and her colleagues – and a good number of economists in general.
The distinction between efficacy and causality is well established in medicine. Yet confusion between the two remains a real problem in evaluation. In other words, knowing that an intervention works does not tell us why it works. We can demonstrate that giving out free laptops has a positive impact on high-school pupils’ attainment. But understanding why is a separate matter altogether. Is it the software installed on the laptop that makes a difference? Do pupils perform better because the intervention boosts their self-esteem? Or can the improvements be attributed to more engaged parents and/or teachers? Yet again, understanding the underlying causal mechanisms demands thorough investigation using a range of methods – especially qualitative ones.
The growing popularity of RCTs can, and indeed should, be placed within the broader context of the role that quantification plays in today’s society. They are part of a trend that French sociologist Albert Ogie calls the “reduction of society to numbers”. Numbers are not a bad thing in and of themselves. Quite the opposite. But an impulsive, narrow-minded obsession with quantifying everything – and all-but disregarding every other area of the social sciences – is problematic because it impoverishes our understanding of society.
RCTs are by no means the first time that quantification has been introduced into politics and policy-making. But randomly assigning participants to groups – a method held up by proponents as the final answer to the evaluation debate – lends RCTs a veneer of scientific objectivity while at the same time masking the fact that decisions such as what intervention to test, or what indicators to measure its impact by, are inherently political acts. Advocates of RCTs consider them the pinnacle of quantification – a way to circumvent ideological debate, and to sideline political considerations, through science and numbers. This somewhat debatable view seems to be upheld by Esther Duflo in her latest book, Poor Economics: A Radical Rethinking of the Way to Fight Global Poverty, co-authored with Abhijit Banerjee and published in 2011. Yet it is worth remembering that RCTs are no less susceptible to political influence than other evaluation methods – and arguably even more so.
Pour aller plus loin
BANERJEE A.V. et DUFLO E., « L’approche expérimentale en économie du développement ». Revue d’économie politique, vol. 119, p.691 726, 2009.
DUFLO E., HANNA R. et RYAN S.P., « Incentives work: getting teachers to come to school ». American Economic Review, vol. 102, n° 4, p.1241 1278, 2012.
JATTEAU A., Les expérimentations aléatoires en économie, Paris, La Découverte. 2013.
LABROUSSE A., « Nouvelle économie du développement et essais cliniques randomisés : une mise en perspective d’un outil de preuve et de gouvernement ». Revue de la régulation, vol. , n° 7, 2010.
MIGUEL E. et KREMER M., « Worms: identifying impacts on education and health in the presence of treatment externalities ». Econometrica, vol. 72, n° 1, p.159 217, 2004.
Three questions for Arthur Jatteau
Focus on randomised controlled trials in economics
In this interview, Arthur Jatteau talks about the subject of his thesis, “Evidence by numbers”, focusing in particular on randomised controlled trials (RCTs). We asked Arthur to tell us more about the theoretical frameworks that shaped his analysis, how he carried out his field work, what data he used, and what conclusions he drew in his thesis.
Length of the interview : 14 minutes 38 seconds
Government in Action: Research and Practice is conceived in partnership with