How to produce a table with statistics according to categorical variable?

Question

I have a lot of variables for which I need descriptive statistics (means). However, as columns I want to use values of categorical variable (AlcCons1).

I used the following code in order to do this:

tabstat Age25_29 Age30_34 ... SmokeY religAtndY, statistics( mean ) by(AlcCons1) And I have got the result like this:

AlcCons1 | Age25_29 Age30_34 Age35_39 Age40_44 Age45_49 Age50_54 Age55_59 ---------+---------------------------------------------------------------------- 1 | .0987326 .0936242 .1243994 .1668614 .1579665 .1481626 .1258278 2 | .1037879 .11853 .1451863 .1415631 .1317288 .1231884 .1387164 3 | .0905679 .1151016 .1405161 .1624963 .1506231 .137278 .123246 4 | .0649853 .0716117 .1094201 .1606857 .1786286 .1630888 .1401794 ---------+---------------------------------------------------------------------- Total | .091001 .0986022 .1286311 .1617972 .156643 .144962 .1289952 ------------------------------

How can I swap columns and rows? (transpose the table)

Nick Cox · Accepted Answer

In principle, the answer is c(statistics). For this kind of example, that is legal and it produces a kind of transposition, but the result is not an exact transposition. Here is one way to do something better.

There is no reproducible example in the question, so we need to find one.

Using means is incidental. The same issue would arise with any other statistic.

This is the kind of table we might want to transpose.

. sysuse census, clear
(1980 Census data by state)

. tabstat poplt5-pop65p , s(p50) by(region)

Summary statistics: p50
  by categories of: region (Census region)

 region |    poplt5   pop5_17    pop18p    pop65p
--------+----------------------------------------
     NE |    185188    637731   2284657    364864
N Cntrl |  327094.5    936449   3126055  521880.5
  South |  289571.5    880546   2803536  407053.5
   West |    114731    303176    884987    109220
--------+----------------------------------------
  Total |  227467.5    629654   2175130    370495
-------------------------------------------------

Trick 1: Simplify the problem by getting a dataset consisting only of what we want to tabulate.

. collapse (p50) poplt5-pop65p, by(region)

. l

     +---------------------------------------------------------+
     | region       poplt5   pop5_17        pop18p      pop65p |
     |---------------------------------------------------------|
  1. | NE          185,188   637,731     2,284,657     364,864 |
  2. | N Cntrl   327,094.5   936,449   3,126,054.5   521,880.5 |
  3. | South     289,571.5   880,546     2,803,536   407,053.5 |
  4. | West        114,731   303,176       884,987     109,220 |
     +---------------------------------------------------------+

Trick 2: Use a reshape to map the distinct variables for different categories to a single categorical variable.

. reshape long pop, i(region) j(age) string
(note: j = 18p 5_17 65p lt5)

Data                               wide   ->   long
-----------------------------------------------------------------------------
Number of obs.                        4   ->      16
Number of variables                   5   ->       3
j variable (4 values)                     ->   age
xij variables:
              pop18p pop5_17 ... poplt5   ->   pop
-----------------------------------------------------------------------------

. l, sepby(region)

     +------------------------------+
     | region     age           pop |
     |------------------------------|
  1. | NE         18p     2,284,657 |
  2. | NE        5_17       637,731 |
  3. | NE         65p       364,864 |
  4. | NE         lt5       185,188 |
     |------------------------------|
  5. | N Cntrl    18p   3,126,054.5 |
  6. | N Cntrl   5_17       936,449 |
  7. | N Cntrl    65p     521,880.5 |
  8. | N Cntrl    lt5     327,094.5 |
     |------------------------------|
  9. | South      18p     2,803,536 |
 10. | South     5_17       880,546 |
 11. | South      65p     407,053.5 |
 12. | South      lt5     289,571.5 |
     |------------------------------|
 13. | West       18p       884,987 |
 14. | West      5_17       303,176 |
 15. | West       65p       109,220 |
 16. | West       lt5       114,731 |
     +------------------------------+

Trick 3: Use tabdisp directly.

. tabdisp age region, c(pop)

--------------------------------------------------------------
          |                   Census region                   
      age |          NE      N Cntrl        South         West
----------+---------------------------------------------------
      18p |   2,284,657  3,126,054.5    2,803,536      884,987
     5_17 |     637,731      936,449      880,546      303,176
      65p |     364,864    521,880.5    407,053.5      109,220
      lt5 |     185,188    327,094.5    289,571.5      114,731
--------------------------------------------------------------

Trick 4: Some cleaning up may be needed.

. label def age 1 lt5 2 5_17 3 18p 4 65p

. encode age , gen(ageclass) label(age)

. tab ageclass

   ageclass |      Freq.     Percent        Cum.
------------+-----------------------------------
        lt5 |          4       25.00       25.00
       5_17 |          4       25.00       50.00
        18p |          4       25.00       75.00
        65p |          4       25.00      100.00
------------+-----------------------------------
      Total |         16      100.00

. label def age 1 "<5" 2 "5-17" 3 "18-64" 4 "65+", modify

. tabdisp ageclass region, c(pop)

--------------------------------------------------------------
          |                   Census region                   
 ageclass |          NE      N Cntrl        South         West
----------+---------------------------------------------------
       <5 |     185,188    327,094.5    289,571.5      114,731
     5-17 |     637,731      936,449      880,546      303,176
    18-64 |   2,284,657  3,126,054.5    2,803,536      884,987
      65+ |     364,864    521,880.5    407,053.5      109,220
--------------------------------------------------------------

How to produce a table with statistics according to categorical variable?

Answers (2)

Related Questions