Reputation: 1713
I am trying to execute the following code:
forval i = 1/51 {
// number of households
by hhid, sort: gen nvals = _n==1
count if (nvals & stateID == `i')
local stateTotalHH = r(N)
local avPersonHH`i' = sum(numper)/`stateTotalHH' if(nvals & stateID ==`i')
drop nvals
}
Everything works fine except if
is not allowed with sum()
. How can I estimate the total or the sum of all values in numper
variable for each state and at household level?
ps:
collapse numper, by(stateID)
because I have other estimationsduplicates drop hhid, force
Upvotes: 1
Views: 11643
Reputation: 37183
Your problem does not even call for sum()
with if
, so it is best to start at the beginning.
Reconstructing your problem, which is not well explained,
You have observations for individuals within households (identifier hhid
) within 50 states of the USA and the District of Columbia (identifier stateID
).
You have a variable numper
, the number of persons per household, and you want the average per state.
Observations are repeated for each individual in a household, so it is necessary to use just one observation per household.
You can tag each household once by
egen tag = tag(hhid)
The average as a new variable would be
egen avPersonHH = mean(numper/tag), by(stateID)
Stata is going to average numper/tag
which variously will be numper/1
and numper/0
; the missings from the latter division will just be ignored, which is what is wanted.
That variable is repeated for each household. To see just one value for each stateID
,
tabdisp stateID, cell(avPersonHH)
What is wrong with your code? Here is a partial list:
a. No loop is required.
b. If it were, the statement by hhid, sort: gen nvals = _n==1
should not be repeated.
c. sum()
is a function for cumulative sums across observations, not what you want here.
d. The line
local avPersonHH`i' = sum(numper)/`stateTotalHH' if(nvals & stateID ==`i')
would at best calculate one number, but the if
condition is misplaced. if
whatever local
... often makes sense in Stata, but putting if
on the right of a local
definition only makes sense for manipulating text containing commands.
Your comment on this line misses these basic misconceptions, c. and d.
e. You were aiming to have collected 51 values of averages in as many local macros, but still need to put them somewhere useful.
f. Separate calculation of totals and numbers is not required, as you can get Stata to calculate the mean for you.
(LATER) This code plays along step by step with your aversion to using collapse
and duplicates
, the grounds for which are not stated. But most experienced Stata users would be happy to use brute force:
duplicates drop hhid, force
collapse numper, by(stateID)
and then merge
back. That solution is not only direct, but also uses fewer idiosyncratic Stata details, which can take time to figure out.
Upvotes: 6