Once again, we start by setting the working directory and loading the LaLonde (1986) data:
. cd "~/git_repos/metricsinstata/docs/part4" /Users/jack/git_repos/metricsinstata/docs/part4
. use "nsw.dta", clear
Recall that this is data from a randomized trial in which some individuals are given employment training.
We would like to test whether the training affected the earnings of participants.
When confronted with data from a randomized experiment, one of the first checks should be to verify that the randomization has been effective.
A simple way to do that is by testing whether pre-determined characteristics differ by the treatment and the control group. In the below code, I test whether average pre-treatment (1975) earnings differ by treatment status. I do this by running a t-test:
. ttest re75, by(treat) Two-sample t test with equal variances ─────────┬──────────────────────────────────────────────────────────────────── Group │ Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ─────────┼──────────────────────────────────────────────────────────────────── 0 │ 425 3026.683 252.2977 5201.25 2530.773 3522.593 1 │ 297 3066.098 282.8697 4874.889 2509.407 3622.789 ─────────┼──────────────────────────────────────────────────────────────────── combined │ 722 3042.897 188.5423 5066.143 2672.739 3413.054 ─────────┼──────────────────────────────────────────────────────────────────── diff │ -39.41544 383.4172 -792.1647 713.3338 ─────────┴──────────────────────────────────────────────────────────────────── diff = mean(0) - mean(1) t = -0.1028 Ho: diff = 0 degrees of freedom = 720 Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 Pr(T < t) = 0.4591 Pr(|T| > |t|) = 0.9182 Pr(T > t) = 0.5409
The ttest
command returns a lot of information. We see for example that mean earnings in the treated group were 39 dollars higher than those of the control group. However, this is small relative to its standard error.
The bottom of the output contains the p-values for a variety of hypotheses. We see that we cannot reject the null hypothesis of the two means being equal. The p-value for this particular null hypothesis is 0.9182.
At least based on this variable, it seems that treatment has been effectively randomized. One could perform equivalent tests for all pre-determined variables available. To be thorough, one ought to adjust for multiple hypothesis testing but I will not address that here.
Given the above evidence that randomization has been effective, I now compare outcomes between the treated and control group. I will run a t-test to test whether post-experiment (1978) earnings differ between the treatment and the control group
. ttest re78, by(treat) Two-sample t test with equal variances ─────────┬──────────────────────────────────────────────────────────────────── Group │ Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ─────────┼──────────────────────────────────────────────────────────────────── 0 │ 425 5090.048 277.368 5718.089 4544.861 5635.236 1 │ 297 5976.352 401.7594 6923.796 5185.685 6767.019 ─────────┼──────────────────────────────────────────────────────────────────── combined │ 722 5454.636 232.7105 6252.943 4997.765 5911.507 ─────────┼──────────────────────────────────────────────────────────────────── diff │ -886.3037 472.0863 -1813.134 40.52635 ─────────┴──────────────────────────────────────────────────────────────────── diff = mean(0) - mean(1) t = -1.8774 Ho: diff = 0 degrees of freedom = 720 Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 Pr(T < t) = 0.0304 Pr(|T| > |t|) = 0.0609 Pr(T > t) = 0.9696
We can reject the null hypothesis of equality of means at the 10% but not the 5% level. We would usually say that the difference is marginally significant.
The assumptions required for a causal interpretation are:
Randomization
Stable unit treatment value (Rubin, 1978)
For more details, see Athey and Imbens (2016)
Under these assumptions, we can interpret the difference as an estimate of the average treatment effect on the treated (ATET). As the treatment group have earnings that are $887 higher than the control group on average, we estimate an ATET of $887, which is marginally statistically significant.
Based on this simple analysis, we conclude that on average the training programme increased earnings of participants by $887.