Evaluation of DRG programs can identify and describe key results, assess or improve the quality of program implementation, identify lessons that might improve the implementation of similar programs, or attribute changes in key outcomes to a program intervention. This section generally focuses on the last type of evaluation– impact evaluation, or determining the extent to which a program contributed to changes in outcomes of interest.
Attributing observed results to programs is perhaps the most difficult research challenge in the DRG program cycle. However, there are several evaluation research designs that can help DRG practitioners determine whether programs have an effect on an outcome of interest, whether programs cause unintended outcomes, which of several alternatives is more likely to have had an effect, whether that effect is positive or negative, and how large that effect might be. Often, these methods can be used within the program cycle to optimize activities, especially within a CLA, adaptive management, or pilot-test-scale framework.
Programs to counter disinformation can take many forms with many possible intended results, ranging from small-scale trainings of journalists or public officials, to broader media literacy campaigns, to mass communications such as fact-checking or rating media outlets. There is no one-size-fits-all evaluation research approach that will work for every disinformation intervention. DRG program designers and implementers should consider consulting with internal staff and applied researchers, external evaluators, or academic researchers to develop an evaluation approach that answers research questions of interest to the program, accounting for practical constraints in time, labor, budget, scale, and M&E capacity.
Key Research Questions:
- Does a program or activity cause a measurable change in an outcome of interest? For example, did a media literacy program increase the capacity of participants to distinguish between true news and false news? Does a program cause unintended outcomes?
- What is the size of the effect or impact of an activity on an outcome of interest?
- What is the direction of the effect of an activity on an outcome of interest? For example, did a fact checking program decrease confidence in false news reports, or did it cause increased acceptance of those reports through backlash?
Randomized or Experimental Approaches
Randomized evaluations (also commonly called randomized controlled trials (RCTs) or field experiments) are often referenced as the gold standard for causal inference – determining whether and how an intervention caused an outcome of interest. Where they are feasible logistically, financially, and ethically, RCTs are the best available method for causal inference because they control for confounding variables – factors other than the intervention that might have caused the observed outcome. RCTs control for these alternative explanations by randomly assigning participants to one or more “treatment” groups (in which they receive a version of the intervention in question) or a “comparison” or “control” group (in which participants receive no intervention or placebo content.) Since participants are assigned randomly to treatment or control, any observed differences in outcomes between those groups can be attributed to the intervention itself. In this way, RCTs can help practitioners and researchers estimate the effectiveness of an intervention.
The costs and logistical commitments for a randomized impact evaluation can be highly variable, depending in large part on the costs of outcome data collection. However, informational interventions, including those intended to counter disinformation, may be particularly amenable to randomized evaluations, as digital tools can support less expensive data collection than face to face methods like interviews or in-person surveys. Regardless of data collection methods, however, randomized evaluations require significant technical expertise and logistical planning, and will not be appropriate for every program, especially those that operate at relatively small scale, since randomized evaluations require large numbers of units of observation in order to identify statistically significant differences. . These evaluation approaches should not be used to evaluate every program. Other impact evaluation methods differ in how they approximate randomization to measure the effect of interventions on observed outcomes, and may be more appropriate for certain program designs.
In 2020, RAND Corporation researchers, in partnership with IREX’s Learn2Discern program in Ukraine, conducted a randomized control trial to estimate both the impact of a Russian disinformation campaign and of a programmatic response that included content labeling and media literacy interventions. The experiment found that Russian propaganda produced emotional reactions and social media engagement among strong partisans, but that those effects were mitigated by labeling the source of the content, and by showing recipients a short video on media literacy.
Quasi-Experimental and Non-Experimental Approaches
Researchers and evaluators may employ quasi-experimental or non-experimental approaches when random assignment to treatment and control is impractical or unethical. As the name suggests, these research designs attempt to attribute changes in outcomes to interventions by approximating random assignment to treatment and control conditions through comparisons. In most cases, this approximation involves collecting data on a population that did not participate in a program, but which is plausibly similar to program participants in other respects. Perhaps the most familiar of these methods for DRG practitioners is a pre-/post-test design, in which program participants are surveyed or tested on the same set of questions both prior to and following their participation in the program. For example, participants in a media literacy program might take a quiz that asks them to distinguish between true and false news, both before and after their participation in the program. In this case, the pre-test measures the capacity of an approximation of a “control” or “comparison” group, and the post-test measures that capacity in a “treatment” group of participants who have received the program. Any increase in the capacity to distinguish true and false news is attributed to the program. Structured comparative case studies and process-tracing are examples of non-experimental designs that control for confounding factors through across-case comparisons or through comparison within the same case over time.
There are a variety of quasi-experimental and observational research methods available for program impact evaluation. The choice of these tools to evaluate the impact of a program depends on available data (or capacity to collect necessary data) and the assumptions that are required to identify reliable estimates of program impact. This table, reproduced in its entirety with the written consent of the Abdul Latif Jameel Poverty Action Lab, provides a menu of these options with their respective data collection requirements and assumptions.
|Method||Description||What assumptions are required, and how demanding are the assumptions?||Required data|
|Randomized Evaluation/ Randomized Control Trial||Measure the differences in outcomes between randomly assigned program participants and non-participants after the program took effect.||The outcome variable is only affected by program participation itself, not by assignment to participate in the program or by participation in the randomized evaluation itself. Examples for such confounding effects could be information effects, spillovers, or experimenter effects. As with other methods, the sample size needs to be large enough so that the two groups are statistically comparable; the difference being that the sample size is chosen as part of the research design.||Outcome data for randomly assigned participants and non-participants (the treatment and control groups).|
Basic non-experimental comparison methods
|Pre-Post||Measure the differences in outcomes for program participants before the program and after the program took effect.||There are no other factors (including outside events, a drive to change by the participants themselves, altered economic conditions, etc.) that changed the measured outcome for participants over time besides the program. In stable, static environments and over short time horizons, the assumption might hold, but it is not possible to verify that. Generally, a diff-in-diff or RDD design is preferred (see below).||Data on outcomes of interest for program participants before program start and after the program took effect.|
|Simple Difference||Measure the differences in outcomes between program participants after the program took effect and another group who did not participate in the program.||There are no differences in the outcomes of participants and non-participants except for program participation, and both groups were equally likely to enter the program before it started. This is a demanding assumption. Nonparticipants may not fulfill the eligibility criteria, live in a different location, or simply see less value in the program (self-selection). Any such factors may be associated with differences in outcomes independent of program participation. Generally, a diff-in-diff or RDD design is preferred (see below).||Outcome data for program participants as well as another group of nonparticipants after the program took effect.|
|Differences in Differences||Measure the differences in outcomes for program participants before and after the program relative to nonparticipants.||Any other factors that may have affected the measured outcome over time are the same for participants and non-participants, so they would have had the same time trajectory absent the program. Over short time horizons and with reasonably similar groups, this assumption may be plausible. A “placebo test” can also compare the time trends in the two groups before the program took place. However, as with “simple difference,” many factors that are associated with program participation may also be associated with outcome changes over time. For example, a person who expects a large improvement in the near future may not join the program (self-selection).||Data on outcomes of interest for program participants as well as another group of nonparticipants before program start and after the program took effect.|
More nonexperimental methods
|Multivariate Regression/OLS||The “simple difference” approach can be— and in practice almost always is—carried out using multivariate regression. Doing so allows accounting for other observable factors that might also affect the outcome, often called “control variables” or “covariates.” The regression filters out the effects of these covariates and measures differences in outcomes between participants and nonparticipants while holding the effect of the covariates constant.||Besides the effects of the control variables, there are no other differences between participants and non-participants that affect the measured outcome. This means that any unobservable or unmeasured factors that do affect the outcome must be the same for participants and nonparticipants. In addition, the control variables cannot in any way themselves be affected by the program. While the addition of covariates can alleviate some concerns with taking simple differences, limited available data in practice and unobservable factors mean that the method has similar issues as simple difference (e.g., self-selection).||Outcome data for program participants as well as another group of non-participants, as well as “control variables” for both groups.|
|Statistical Matching||Exact matching: participants are matched to non-participants who are identical based on “matching variables” to measure differences in outcomes. Propensity score matching uses the control variables to predict a person’s likelihood to participate and uses this predicted likelihood as the matching variable.||Similar to multivariable regression: there are no differences between participants and non-participants with the same matching variables that affect the measured outcome. Unobservable differences are the main concern in exact matching. In propensity score matching, two individuals with the same score may be very different even along observable dimensions. Thus, the assumptions that need to hold in order to draw valid conclusions are quite demanding.||Outcome data for program participants as well as another group of non-participants, as well as “matching variables” for both groups.|
|Regression Discontinuity Design (RDD)||In an RDD design, eligibility to participate is determined by a cutoff value in some order or ranking, such as income level. Participants on one side of the cutoff are compared to non-participants on the other side, and the eligibility criterion is included as a control variable (see above).||Any difference between individuals below and above the cutoff (participants and non-participants) vanishes closer and closer to the cutoff point. A carefully considered regression discontinuity design can be effective. The design uses the “random” element that is introduced when two individuals who are similar to each other according to their ordering end up on different sides of the cutoff point. The design accounts for the continual differences between them using control variables. The assumption that these individuals are similar to each other can be tested with observables in the data. However, the design limits the comparability of participants further away from the cutoff.||Outcome data for program participants and non-participants, as well as the “ordering variable” (also called “forcing variable”).|
|Instrumental Variables||The design uses an “instrumental variable” that is a predictor for program participation. The method then compares individuals according to their predicted participation, rather than actual participation.||The instrumental variable has no direct effect on the outcome variable. Its only effect is through an individual’s participation in the program. A valid instrumental variable design requires an instrument that has no relationship with the outcome variable. The challenge is that most factors that affect participation in a program for otherwise similar individuals are also in some way directly related to the outcome variable. With more than one instrument, the assumption can be tested.||Outcome data for program participants and non-participants, as well as an “instrumental variable.|
Media Monitoring and Content Analysis
Media monitoring and content analysis approaches generally aim to answer research questions about whether, how, or why interventions change audience engagement with information or the nature or quality of the information itself. For example, a fact-checking program might hypothesize that correcting disinformation should result in less audience engagement with outlets for disinformation on social media, as measured by views, likes, shares, or comments.
Several tools are available to help DRG practitioners and researchers identify changes in media content. Content analysis is a qualitative research approach through which researchers can identify key themes in written, audio, or video material, and whether those themes change over time. Similarly, sentiment analysis can help identify the nature of attitudes or beliefs around a theme.
Both content and sentiment analysis can be conducted using human or machine-assisted coding and should be conducted at multiple points in the program cycle in conjunction with other evaluation research designs for project impact evaluation.
Network analysis is a method for understanding how and why the structure of relationships between actors affects an outcome of interest. Network analysis is a particularly useful research method for countering disinformation programs because it allows analysts to visualize and understand how information is disseminated through online networks, including social media platforms, discussion boards, and other digital communities. By synthesizing information on the number of actors, the frequency of interactions between actors, the quality or intensity of interactions, and the structure of relationships, network analysis can help researchers and practitioners identify key channels for the propagation of disinformation, the direction of transmission of information or disinformation, clusters denoting distinct informational ecosystems, and whether engagement or amplification is genuine or artificial. In turn, network metrics can help inform the design, content, and targeting of program activities. To the extent analysts can collect network data over time, network analysis can also inform program monitoring and evaluation.
Data collection tools for network analysis depend on the nature of the network generally, and the network platform specifically. Network analysis can be conducted on offline networks where researchers have the capacity to collect data using standard face-to-face, telephone, computer-assisted, or SMS survey techniques. In these cases, researchers have mapped offline community networks using survey instruments that ask respondents to list individuals or organizations that are particularly influential, or whom they might approach for a particular task. Researchers can then map networks by aggregating and coding responses from all community respondents. In this way, researchers might determine which influential individuals in a community might be nodes for the dissemination of information, particularly in contexts where people rely largely on family and friends for news or information.
However, depending on APIs and terms of service, digital platforms such as social media can reduce the costs of network data collection. With dedicated tools, including social network analysis software, researchers can analyze and visualize relationships between users, including content engagement, following relationships, and liking or sharing. These tools can provide practitioners with an understanding of the structure of online networks, and in conjunction with content analysis tools, how network structure interacts with particular kinds of content.
- Several researchers have argued against the use of the “quasi-experimental” descriptor, noting that either the researcher has control over the assignment of units to treatment or control, or they do not. We retain the term given its common usage to refer to methods like pre/post designs, regression discontinuity, instrumental variables, difference-in-differences, and matching, but subsume both quasi- and non-experimental methods into one category, acknowledging the logic that each entail methods for controlling for confounding factors through various types of comparisons.
- The same individuals comprise both treatment and control groups in this analogy, and there are many reasons aside from the intervention, including participant selection, that could plausibly account for changes in outcomes between pre-tests and post-tests. For example, the media literacy program might be advertised to potential participants who are connected to implementing organizations in some way and may therefore be wealthier or more educated than the average citizen. In this case, some characteristic of the participant population (e.g. education or learning capacity) could drive increases in test scores between pre- and post-tests, independent of any of the program content. In this case, the pre-/post-test design could lead researchers or practitioners to overestimate the actual effect of the program.
- See, for example, Wibbels, Erik. “The Social Underpinnings of Decentralized Governance: Networks, Technology, and the Future of Social Accountability.” In Decentralized Governance and Accountability: Academic Research and the Future of Donor Programming, 14–40. New York: Cambridge University Press, 2019.