Reputation: 27
My data set looks like this in Stata
YEAR | Gender | Presidents |
---|---|---|
2023 | Male | 5 |
2023 | Female | 6 |
2023 | Total | 11 |
2022 | Male | 3 |
2022 | Female | 2 |
2022 | Total | 5 |
I want to be able to create a new variable/column that gives me the percentages for that year by taking the fraction of male/total and female/total for that particular year. In other words, I want my dataset to have an additional column that looks like this:
YEAR | Gender | Presidents | Perc_pres |
---|---|---|---|
2023 | Male | 5 | 0.45 |
2023 | Female | 6 | 0.55 |
2023 | Total | 11 | 1 |
2022 | Male | 3 | 0.6 |
2022 | Female | 2 | 0.4 |
2022 | Total | 5 | 1 |
I tried the egen perc_pres_`gender' = total(cond(gender==Female), 1, .) command to create new 2 new columns for female total and male total and then took percentages but that seems too long a way to get what I want. I just want one column to give me the percentages for both men and women and no additional columns.
Upvotes: 0
Views: 467
Reputation: 37338
Generally, a data layout such as yours is an example of spreadsheet thinking that doesn't match standard good practice in statistical software. Totals based on some observations (rows) should not be held in other observations in the same dataset, but in other variables.
Specifically, your expression Gender == Female
at best makes sense if Female
is another variable in the dataset. It would be illegal for either of the two other (and more obvious) possibilities, that Gender
is a string variable, in which case "Female"
is needed, or that Gender
is a numeric variable with value labels, in which case a different comparison is needed.
Your apparent use of a local macro gender
also doesn't make sense to me.
You are confusing percents (which add to 100) with proportions or fractions (which add to 1).
The code needed here is awkward largely because your data layout is awkward.
Here is one solution, which hinges on the fact that Female
Male
Total
sort in that order.
* Example generated by -dataex-. For more info, type help dataex
clear
input int year str6 gender byte presidents
2023 "Male" 5
2023 "Female" 6
2023 "Total" 11
2022 "Male" 3
2022 "Female" 2
2022 "Total" 5
end
bysort year (gender) : gen percent = 100 * presidents / presidents[_N]
list , sepby(year)
+-------------------------------------+
| year gender presid~s percent |
|-------------------------------------|
1. | 2022 Female 2 40 |
2. | 2022 Male 3 60 |
3. | 2022 Total 5 100 |
|-------------------------------------|
4. | 2023 Female 6 54.54546 |
5. | 2023 Male 5 45.45454 |
6. | 2023 Total 11 100 |
+-------------------------------------+
If that doesn't help (enough), please use dataex
to give an example (and see the stata
tag wiki).
Upvotes: 0