Reputation: 3099
I have the following dataset (individual level data):
pid year state income
1 2000 il 100
2 2000 ms 200
3 2000 al 30
4 2000 dc 400
5 2000 ri 205
1 2001 il 120
2 2001 ms 230
3 2001 al 50
4 2001 dc 400
5 2001 ri 235
.........etc.......
I need to estimate average income for each state in each year and
create a new dataset that would look like this:
state year average_income
ar 2000 150
ar 2001 200
ar 2002 250
il 2000 150
il 2001 160
il 2002 160
...........etc...............
I already have a code that runs perfectly fine (I have two loops). However, I would like to know is there any better way in Stata like sql style query?
Upvotes: 1
Views: 608
Reputation: 37208
This is shorter code than any suggested so far:
collapse average_income=income, by(state year)
Upvotes: 11
Reputation: 1269703
You have the SQL tag on the question. This is a basic aggregation query in SQL:
select state, year, avg(income) as average_income
from t
group by state, year;
To put this in a table, depends on your database. One of the following typically works:
create table NewTable as
select state, year, avg(income) as average_income
from t
group by state, year;
Or:
select state, year, avg(income) as average_income
into NewTable
from t
group by state, year;
Upvotes: 1
Reputation: 9768
This shouldn't need 2 loops, or any for that matter. There are in fact more efficient ways to do this. When you are repeating an operation on many groups, the bysort
command is useful:
bysort year state: egen average_income = mean(income)
You also don't have to create a new dataset, you can just prune this one and save it. Start by only keeping the variables you want (state, year and average_income) and get rid of duplicates:
keep state year average_income
duplicates drop
save "mynewdataset.dta"
Upvotes: 2