CHEBURASHKA
CHEBURASHKA

Reputation: 1713

Stata. How to use if statement with sum()?

I am trying to execute the following code:

    forval i = 1/51 {

    // number of households
    by hhid, sort: gen nvals = _n==1
    count if (nvals & stateID == `i')
    local stateTotalHH = r(N)

        local avPersonHH`i' = sum(numper)/`stateTotalHH' if(nvals & stateID ==`i')
        drop nvals 
}

Everything works fine except if is not allowed with sum(). How can I estimate the total or the sum of all values in numper variable for each state and at household level?

ps:

  1. I cannot use collapse numper, by(stateID) because I have other estimations
  2. also, I cannot do the following: duplicates drop hhid, force

Upvotes: 1

Views: 11643

Answers (1)

Nick Cox
Nick Cox

Reputation: 37183

Your problem does not even call for sum() with if, so it is best to start at the beginning.

Reconstructing your problem, which is not well explained,

  1. You have observations for individuals within households (identifier hhid) within 50 states of the USA and the District of Columbia (identifier stateID).

  2. You have a variable numper, the number of persons per household, and you want the average per state.

  3. Observations are repeated for each individual in a household, so it is necessary to use just one observation per household.

You can tag each household once by

  egen tag = tag(hhid) 

The average as a new variable would be

 egen avPersonHH = mean(numper/tag), by(stateID) 

Stata is going to average numper/tag which variously will be numper/1 and numper/0; the missings from the latter division will just be ignored, which is what is wanted.

That variable is repeated for each household. To see just one value for each stateID,

 tabdisp stateID, cell(avPersonHH) 

What is wrong with your code? Here is a partial list:

a. No loop is required.

b. If it were, the statement by hhid, sort: gen nvals = _n==1 should not be repeated.

c. sum() is a function for cumulative sums across observations, not what you want here.

d. The line

  local avPersonHH`i' = sum(numper)/`stateTotalHH' if(nvals & stateID ==`i')

would at best calculate one number, but the if condition is misplaced. if whatever local ... often makes sense in Stata, but putting if on the right of a local definition only makes sense for manipulating text containing commands.

Your comment on this line misses these basic misconceptions, c. and d.

e. You were aiming to have collected 51 values of averages in as many local macros, but still need to put them somewhere useful.

f. Separate calculation of totals and numbers is not required, as you can get Stata to calculate the mean for you.

(LATER) This code plays along step by step with your aversion to using collapse and duplicates, the grounds for which are not stated. But most experienced Stata users would be happy to use brute force:

   duplicates drop hhid, force 
   collapse numper, by(stateID) 

and then merge back. That solution is not only direct, but also uses fewer idiosyncratic Stata details, which can take time to figure out.

Upvotes: 6

Related Questions