somebody somewhere
somebody somewhere

Reputation: 11

Stata: replace missing values with the values from observations with same ID number

I am using population-level American Community Survey data to look at factors that affect income from self-employment, with a primary interest in the female population. I want to create a variable to measure "husband's income." There is a variable pincp that measures a person's total income, and dummy variables I created for married and female. All households are linked by a unique identifier serialno. I am using Stata.

Universe: population age 18 and older whose primary job is self-employment. Must have earned at least $1000 from self-employment in past year, and under the 95th percentile for self-employed earnings.

Assuming that a married male in a household represents a husband**,

gen husb_income = pincp if female==0 & married==1

How do I copy the value of husb_income for other observations with the same serialno? If there is an (employed) married man in a household, I want husb_income to reflect his income for all observations pertaining to that household.

** I know that this is a gratuitous assumption; I'm not concerned with that right now.

Upvotes: 1

Views: 2170

Answers (2)

StasK
StasK

Reputation: 1555

I would go with something like

egen husb_income = total( pincp*(female==0)*(married==1) ), by(serialno)

If that's too rough, you would want to create more detailed code using something like

bysort serialno (female) : gen husb_income = pincp[1] * (_N == 2) * (female==0)

for nuclear families with just the husband and wife. If you are not familiar with these constructs, you should read about them in the manual and Nick Cox' column (http://www.stata-journal.com/article.html?article=pr0004).

ACS data have detail linkages between family members, so you should be able to identify exactly who the husband is of the female in question using these linkages.

Upvotes: 2

Penguin_Knight
Penguin_Knight

Reputation: 1299

Keep the cases that are of the married male, drop all variables except serialno and pincp. Rename pincp to husb_income. Save it as a separate data set.

Now, open the original data set, use merge command to merge the husband data back:

use originalData, replace
merge m:1 serialno using c:\temp\whateverTheHusbandFileIsCalled

Also, you may have more than 2 married males in the same household. If that happens, the above command will not work because it will become a many-to-many merging. In that case, you'd have to generate an extra couple indicator and incorporate that into the merge statement as an identifier right next to serialno.

Upvotes: 1

Related Questions