Shubhi
Shubhi

Reputation: 27

Is there a way to calculate percentages comparing observations?

My data set looks like this in Stata

YEAR Gender Presidents
2023 Male 5
2023 Female 6
2023 Total 11
2022 Male 3
2022 Female 2
2022 Total 5

I want to be able to create a new variable/column that gives me the percentages for that year by taking the fraction of male/total and female/total for that particular year. In other words, I want my dataset to have an additional column that looks like this:

YEAR Gender Presidents Perc_pres
2023 Male 5 0.45
2023 Female 6 0.55
2023 Total 11 1
2022 Male 3 0.6
2022 Female 2 0.4
2022 Total 5 1

I tried the egen perc_pres_`gender' = total(cond(gender==Female), 1, .) command to create new 2 new columns for female total and male total and then took percentages but that seems too long a way to get what I want. I just want one column to give me the percentages for both men and women and no additional columns.

Upvotes: 0

Views: 467

Answers (1)

Nick Cox
Nick Cox

Reputation: 37338

Generally, a data layout such as yours is an example of spreadsheet thinking that doesn't match standard good practice in statistical software. Totals based on some observations (rows) should not be held in other observations in the same dataset, but in other variables.

Specifically, your expression Gender == Female at best makes sense if Female is another variable in the dataset. It would be illegal for either of the two other (and more obvious) possibilities, that Gender is a string variable, in which case "Female" is needed, or that Gender is a numeric variable with value labels, in which case a different comparison is needed.

Your apparent use of a local macro gender also doesn't make sense to me.

You are confusing percents (which add to 100) with proportions or fractions (which add to 1).

The code needed here is awkward largely because your data layout is awkward.

Here is one solution, which hinges on the fact that Female Male Total sort in that order.

* Example generated by -dataex-. For more info, type help dataex
clear
input int year str6 gender byte presidents
2023 "Male"    5
2023 "Female"  6
2023 "Total"  11
2022 "Male"    3
2022 "Female"  2
2022 "Total"   5
end

bysort year (gender) : gen percent = 100 * presidents / presidents[_N]

list , sepby(year)

     +-------------------------------------+
     | year   gender   presid~s    percent |
     |-------------------------------------|
  1. | 2022   Female          2         40 |
  2. | 2022     Male          3         60 |
  3. | 2022    Total          5        100 |
     |-------------------------------------|
  4. | 2023   Female          6   54.54546 |
  5. | 2023     Male          5   45.45454 |
  6. | 2023    Total         11        100 |
     +-------------------------------------+

If that doesn't help (enough), please use dataex to give an example (and see the stata tag wiki).

Upvotes: 0

Related Questions