Reputation: 249
I have a lot of variables for which I need descriptive statistics (means). However, as columns I want to use values of categorical variable (AlcCons1
).
I used the following code in order to do this:
tabstat Age25_29 Age30_34 ... SmokeY religAtndY, statistics( mean ) by(AlcCons1)
And I have got the result like this:
AlcCons1 | Age25_29 Age30_34 Age35_39 Age40_44 Age45_49 Age50_54 Age55_59
---------+----------------------------------------------------------------------
1 | .0987326 .0936242 .1243994 .1668614 .1579665 .1481626 .1258278
2 | .1037879 .11853 .1451863 .1415631 .1317288 .1231884 .1387164
3 | .0905679 .1151016 .1405161 .1624963 .1506231 .137278 .123246
4 | .0649853 .0716117 .1094201 .1606857 .1786286 .1630888 .1401794
---------+----------------------------------------------------------------------
Total | .091001 .0986022 .1286311 .1617972 .156643 .144962 .1289952
------------------------------
How can I swap columns and rows? (transpose the table)
Upvotes: 0
Views: 1447
Reputation: 249
I found the answer under following link: https://www.stata.com/statalist/archive/2005-09/msg00561.html I tried to transpose the table, so I installed the command:
ssc install tabstatmat, replace
tabstat Age25_29 Age30_34 CurntSmokeY religAtndY, by(AlcCons1) stat(mean) col(stat) long format(%9.2f) save
qui tabstatmat B
matrix B = B'
matrix list B, f(%9.2f)
And I have got what I needed:
B[41,5]
1: 2: 3: 4: Total:
mean mean mean mean mean
Age25_29 0.10 0.10 0.09 0.06 0.09
Age30_34 0.09 0.12 0.12 0.07 0.10
Age35_39 0.12 0.15 0.14 0.11 0.13
Age40_44 0.17 0.14 0.16 0.16 0.16
The question now is how to make it look better (delete "mean", Change 1,2,3,4 by words) and then to use putexcel
command?
Upvotes: 0
Reputation: 37208
In principle, the answer is c(statistics)
. For this kind of example, that is legal and it produces a kind of transposition, but the result is not an exact transposition. Here is one way to do something better.
There is no reproducible example in the question, so we need to find one.
Using means is incidental. The same issue would arise with any other statistic.
This is the kind of table we might want to transpose.
. sysuse census, clear
(1980 Census data by state)
. tabstat poplt5-pop65p , s(p50) by(region)
Summary statistics: p50
by categories of: region (Census region)
region | poplt5 pop5_17 pop18p pop65p
--------+----------------------------------------
NE | 185188 637731 2284657 364864
N Cntrl | 327094.5 936449 3126055 521880.5
South | 289571.5 880546 2803536 407053.5
West | 114731 303176 884987 109220
--------+----------------------------------------
Total | 227467.5 629654 2175130 370495
-------------------------------------------------
Trick 1: Simplify the problem by getting a dataset consisting only of what we want to tabulate.
. collapse (p50) poplt5-pop65p, by(region)
. l
+---------------------------------------------------------+
| region poplt5 pop5_17 pop18p pop65p |
|---------------------------------------------------------|
1. | NE 185,188 637,731 2,284,657 364,864 |
2. | N Cntrl 327,094.5 936,449 3,126,054.5 521,880.5 |
3. | South 289,571.5 880,546 2,803,536 407,053.5 |
4. | West 114,731 303,176 884,987 109,220 |
+---------------------------------------------------------+
Trick 2: Use a reshape
to map the distinct variables for different categories to a single categorical variable.
. reshape long pop, i(region) j(age) string
(note: j = 18p 5_17 65p lt5)
Data wide -> long
-----------------------------------------------------------------------------
Number of obs. 4 -> 16
Number of variables 5 -> 3
j variable (4 values) -> age
xij variables:
pop18p pop5_17 ... poplt5 -> pop
-----------------------------------------------------------------------------
. l, sepby(region)
+------------------------------+
| region age pop |
|------------------------------|
1. | NE 18p 2,284,657 |
2. | NE 5_17 637,731 |
3. | NE 65p 364,864 |
4. | NE lt5 185,188 |
|------------------------------|
5. | N Cntrl 18p 3,126,054.5 |
6. | N Cntrl 5_17 936,449 |
7. | N Cntrl 65p 521,880.5 |
8. | N Cntrl lt5 327,094.5 |
|------------------------------|
9. | South 18p 2,803,536 |
10. | South 5_17 880,546 |
11. | South 65p 407,053.5 |
12. | South lt5 289,571.5 |
|------------------------------|
13. | West 18p 884,987 |
14. | West 5_17 303,176 |
15. | West 65p 109,220 |
16. | West lt5 114,731 |
+------------------------------+
Trick 3: Use tabdisp
directly.
. tabdisp age region, c(pop)
--------------------------------------------------------------
| Census region
age | NE N Cntrl South West
----------+---------------------------------------------------
18p | 2,284,657 3,126,054.5 2,803,536 884,987
5_17 | 637,731 936,449 880,546 303,176
65p | 364,864 521,880.5 407,053.5 109,220
lt5 | 185,188 327,094.5 289,571.5 114,731
--------------------------------------------------------------
Trick 4: Some cleaning up may be needed.
. label def age 1 lt5 2 5_17 3 18p 4 65p
. encode age , gen(ageclass) label(age)
. tab ageclass
ageclass | Freq. Percent Cum.
------------+-----------------------------------
lt5 | 4 25.00 25.00
5_17 | 4 25.00 50.00
18p | 4 25.00 75.00
65p | 4 25.00 100.00
------------+-----------------------------------
Total | 16 100.00
. label def age 1 "<5" 2 "5-17" 3 "18-64" 4 "65+", modify
. tabdisp ageclass region, c(pop)
--------------------------------------------------------------
| Census region
ageclass | NE N Cntrl South West
----------+---------------------------------------------------
<5 | 185,188 327,094.5 289,571.5 114,731
5-17 | 637,731 936,449 880,546 303,176
18-64 | 2,284,657 3,126,054.5 2,803,536 884,987
65+ | 364,864 521,880.5 407,053.5 109,220
--------------------------------------------------------------
Upvotes: 1