dds ss
dds ss

Reputation: 49

Stata tabulation with zeros

I have a problem with Stata and the command:

svy: tab x

when there are no observations for a given x.

My problem is that when there are no observations for a category Stata simply drops the corresponding row.

My task is to run several tabulations and to save and export the key results to a csv file. Sometimes the stored vectors have n elements while sometimes because of the zeros they have only n-1 elements so I don't know how to combine them in a larger matrix (or at least export it in a file with a regular spacing between rows and with a value of 0 if there are not observations). I have also tried the

estpost svy, subpop(x0): tab x, count se format(%10.4g)

But I still have the same problem.

Upvotes: 1

Views: 1739

Answers (1)

Steve Samuels
Steve Samuels

Reputation: 908

Update 3 This solution is based on estpost svy: tab because that command returns more usable result vectors than does svy: tab itself. Like the previous version, this solution puts all those results into a Stata data set. It adds a check for whether the data contain missing categories before resorting to the loop and tightens the loop limits slightly. Following Nick's suggestion, missing values are substituted for all standard-error related statistics. Note that

 estpost svy: tab rep78

by default puts the estimated cell proportions into e(b) and their standard errors into e(se), while

 estpost svy: tab rep78, count

puts the estimated counts and their SEs into those matrices. However the other summary is still available, either in e(cell) or e(count).

 sysuse auto, clear
 drop if rep78==2 |rep78==5

 svyset _n [pw = turn]
 estpost svy: tab rep78,  se

 /* Number categories from 1 to max */
 local maxcat = 5
 mata:
 /* count rows, add one for totals row
   assign the category for that row as .a */
 r = (st_matrix("e(Row)"), .a)'
 b = st_matrix("e(b)")'
 serr = st_matrix("e(se)")'
 lb = st_matrix("e(lb)")'
 ub = st_matrix("e(ub)")'
 def = st_matrix("e(deff)")'
 dft = st_matrix("e(deft)")'
 ct = st_matrix("e(count)")'
 pr = st_matrix("e(cell)")'
 obs = st_matrix("e(obs)")'
 d1 =(r , b, serr, lb, ub, def, dft, obs, pr, ct)

 /*  Where there are no totals, use a standard missing value */
 d1[rows(d1),3::7] = J(1,5, .)

 /* Check if there are no missing rows.
 If so, output the original returned matrices */
 if (`e(r)' ==`maxcat') d = d1
 /* Else create a zero matrix and populate it
 with statistics for the non-missing categories*/
 else {   
     d2= J(`maxcat',10,0)
     d2[.,1] =(1::`maxcat')
     for (j = 1; j<=`e(r)'; j++) {
        for (k = 1; k<=r[j,1]; k++) {
           if (r[j,1]== k) {
                   d2[k,2] = b[j,1]
                   d2[k,3] = serr[j,1]
                   d2[k,4] = lb[j,1]
                   d2[k,5] = ub[j,1]
                   d2[k,6] = def[j,1]
                   d2[k,7] = dft[j,1]
                   d2[k,8] = obs[j,1]
                   d2[k,9] = pr[j,1]
                   d2[k,10] = ct[j,1]
            }
        }
    }
 /* If rows are missing set SE-realated stats to missing*/
   for (k = 1; k<=`maxcat'; k++) {
      if (d2[k,2] == 0)  d2[k,3..7] =J(1,5,.)
    }
 /* Now add the totals row */
 d = d2 \ d1[rows(d1),.]
 }
 end
 clear
 getmata (rep78 b se lb ub deff deft nobs prop count ) = d
 format  b se lb ub deff deft prop  %5.2f
 format nobs count %10.0gc
 label define rtot  .a "Totals"
 label values rep78 rtot
 list
 save results, replace

Original Answer Here's a way to create a matrix new that will include the zero categories. The logic: set up a matrix of zeros to hold results for all the categories; then replace the zeros with values from the non-missing categories.The macro maxcat contains the maximum number of categories for the tabulated variable. The code assumes that the categories in the tabled variables are integers from 1 to maxcat. The mata block extracts the vector of standard errors, and the scalar e(r)holds the number of rows in the actual table.

 sysuse auto, clear
 svyset _n
 drop if rep78== 2 | rep78==5
 svy: tab rep78, count se

 local  maxcat = 5  //max no. of categories
 matrix  oldr = e(Row)'   // category values
 matrix ct = e(Obs)  // table counts

 // serr is a vector of std. errors
 mata: st_matrix("serr", sqrt(diagonal(st_matrix("e(V)"))))

 // matrix new  will hold the expanded results
 matrix new = J(`maxcat', 3, 0)

 forvalues j = 1/`=e(r)' {
 forvalues k = 1/`maxcat'{
 matrix new[`k',1] = `k'
 if oldr[`j',1]== `k'  {
 matrix new[`k',2] = ct[`j',1]
 matrix new[`k',3] = serr[`j',1]
 }
 }
 }
 matrix list new

Update 2: Here's a version that does most of the work in Mata, then saves the estimates into a Stata data set. I've changed the names of the matrices slightly.

 sysuse auto, clear
 svyset _n
 drop if rep78== 2 | rep78==5
 svy: tab rep78, count se
 local maxcat =5

 mata:
 r = st_matrix("e(Row)")'
 ct = st_matrix("e(Obs)")
 serr= sqrt(diagonal(st_matrix("e(V)")))
 d = J(`maxcat',3,0)
 for (j = 1; j<=`e(r)'; j++) {
     for (k = 1; k<=`maxcat'; k++) {
         d[k,1] = k
         if (r[j,1]== k) {
            d[k,2] =   ct[j,1]
            d[k,3] = serr[j,1]
         }
    }
  }
 end
 clear
 getmata (rep78 count se) = d
 replace se = . if count==0
 format se %8.2f
 list
 save results, replace

Upvotes: 1

Related Questions