zlqs1985
zlqs1985

Reputation: 529

How to create "non-standard" descriptive statistics more efficiently in Stata

Say I want to create some scalar value like median price/median income mean downpayment/house price. I know I can first use su command and then extract denominators and numerators separately from the r-class and then create the desired scalars.

However, when I have a dozen such scalars and by different household type, such approach is tedious in practice. So I wonder if there's any way to accomplish above work more efficiently? If I can create a table containing such scalars within Stata, it's even more amusing.

Upvotes: 1

Views: 146

Answers (1)

Nick Cox
Nick Cox

Reputation: 37233

Executive summary: So, don't use scalars; use variables instead.

There is a prior statistical issue, which is that (say) summary(y) / summary(x) is not necessarily equal to summary(y/x); in general, the two will differ. It seems to me that the latter usually makes more sense, but set that aside otherwise.

Here is one not too crazy example. How much do you have to pay (in US dollars circa 1978) per pound weight (physicists: mass, really) for various cars in the Stata auto dataset?

. sysuse auto
(1978 Automobile Data)

. gen pricePERlb = price/weight

. egen mean = mean(pricePERlb), by(rep78)

. tabstat mean, s(n mean) by(rep78)

Summary for variables: mean
     by categories of: rep78 (Repair Record 1978)

   rep78 |         N      mean
---------+--------------------
       1 |         2  1.479266
       2 |         8  1.731407
       3 |        30  1.895855
       4 |        18   2.25233
       5 |        11  2.472519
---------+--------------------
   Total |        69  2.049639
------------------------------

Now here's a small twist. The generate wasn't needed here. We could have gone egen mean = mean(price/weight), by(rep78).

The tools are all trivial: generate to create new variables, egen to create new variables that here can be summary statistics calculated for groups, and tabstat, among many other tabulation commands, to show results. Since the statistics here are by construction constant within groups, asking for their mean is just one of several ways of getting at them. Similarly, graph dot, graph hbar, etc. are immediate for display.

Upvotes: 2

Related Questions