Reputation: 179
I have a data set of various individuals that has 2 variables, income and birth year. I want to make a line graph which has birth year on the x-axis and the average value of income for people born in that year on the y-axis.
Try as I might, I just can't get it to work. I tried with the twoway
command, and even tried making a histogram, but neither calculate the mean. How do I code this? Is there a way I can create another variable that stores all the mean values corresponding to each year?
Upvotes: 0
Views: 1578
Reputation: 9460
There's definitely more than one way to skin this cat, but here's two I use regularly. Personally, I prefer to use a regression for this, but you can also use extended generate (egen
) like Roberto suggested in the comments to your post. Sometimes the egen
approach takes a while to render if the data is large (though there are tricks to avoid this that I will not get into).
Here's an example with some data that resembles yours:
/* Get some data */
webuse set "http://www.stata-press.com/data/musr"
webuse "mus02psid92m.dta", clear
/* (1) With egen */
bysort age: egen mean_earnings_by_age = mean(earnings)
twoway (connected mean_earnings_by_age age)
/* (2) Using Regression */
regress earnings i.age
margins age
marginsplot, noci
/* Check that (1) and (2) are the same */
marginsplot, noci addplot(connected mean_earnings_by_age age)
webuse set // reset webuse to default
Upvotes: 2