10/16/2009 # Randomization From last recitation: we learned that randomization takes care of selection bias. Steps: 1) Once you define a treatment and control group, check to see if randomization "worked." - Use stratification to split males/females, highschool/college into four different cells (male, college), etc. Once you've split them into four groups, randomize (treatment/control) between both groups, and now you know you've controlled for those properties. - You can check things worked by pre-testing population and getting similar results. 2) Do a power calculation---what N do you need in order to see significance at a conservative (small) effect size? Don't assume big effect size, since you'll underestimte N. 3) Handle attrition (people leaving during study) - power is affected, since those data points don't count - selection---maybe the group that got healthier leaves study early, etc. Think about whether the bias in your estimate would cause an underestimate or an overestimate in effect size. 4) Non-compliance - people might not follow your rules, so you can use (wold estimator?) to scale up the effect size appropriately 5) Contagion---control group may benefit from the treatment group getting treated. e.g. deworming treatment might cause less worms to go to those who don't take deworming medicine. - To avoid this, give different fraction of treatment to different villages, so you can measure the spillover effect, which is interesting on its own. # Instruments Let's say we measure education's effects on wage. wage = education*\beta + \epsilon \beta = covariance(wage, education)/variance(education) [how does wage change with education for every change in education) \epsilon is a noise term But what else can change your wages? - parent's wealth - place of residence - political unrest - economic environment - ability at school But that's all rolled into the noise term right now. let's consider ability. rewrite model as wage = education*\beta_2 + ability*\gamma + \mu (\mu is the new noise term, not including ability) then \beta = covariance(wage, education)/variance(education) = cov(educ*\beta+ability*\gamma + \mu, educ)/var(educ) (covarance lets addition pass through it) = (cov(educ*\beta, educ)+cov(ability*\gamma, educ) + cov(\mu, educ))/var(educ) = \beta + \gamma*cov(ability, educ)/var(educ) + 0 (don't get confused by \beta being on both sides of the equation---it's just similar in estimation). But this means that if we had considered education's effect on wage as well as ability to learn on wage, then we'd attribute education's increase to wage's increase, and policies would say to increase education even though it's ability that might affect wages. So can't measure education->wages directly. Need to measure something else, called an instrument. z -> education -> wage. z is an instrument that would only affect wage by way of education. So: z -> wage only because z -> education For example, depending on your quarter of birth and you want to drop out, you have to wait an extra year to drop out (you always have to finish your current year, so if you are born later in the year, you'll have to go through one more year of school. So if you measure that instrument (quarter of birth) on wages, it probably can't affect wages other than through year of education. So it's a good instrument.