Reputation: 15
I have calculated in Stata the percentage observations per group, year, and category in a new variable.
group | year | category | percentage |
---|---|---|---|
A | 2020 | 1 | 0.4 |
A | 2020 | 1 | 0.4 |
A | 2020 | 2 | 0.6 |
A | 2020 | 2 | 0.6 |
A | 2020 | 2 | 0.6 |
B | 2020 | 1 | 0.67 |
B | 2020 | 1 | 0.67 |
B | 2020 | 2 | 0.33 |
Now I want to have these percentages (per group, year, and category) in separate variables per category. So the results should look like this:
group | year | category | percentage | cat1 | cat2 |
---|---|---|---|---|---|
A | 2020 | 1 | 0.4 | 0.4 | 0.6 |
A | 2020 | 1 | 0.4 | 0.4 | 0.6 |
A | 2020 | 2 | 0.6 | 0.4 | 0.6 |
A | 2020 | 2 | 0.6 | 0.4 | 0.6 |
A | 2020 | 2 | 0.6 | 0.4 | 0.6 |
B | 2020 | 1 | 0.67 | 0.67 | 0.33 |
B | 2020 | 1 | 0.67 | 0.67 | 0.33 |
B | 2020 | 2 | 0.33 | 0.67 | 0.33 |
So code-wise I did the following:
bysort group year category: gen cat_obs = _N
bysort group year : gen sum_GY_obs = _N
gen mean = cat_obs / sum_GY_obs
foreach i of num 1/2 {
gen cat_`i'= mean if cat==`i'
}
Upvotes: 0
Views: 567
Reputation: 37208
I would call your calculated results proportions or fractions (adding to 1), not percentages (adding to 100). As a proportion is (or can be considered as) the mean of an indicator variable, itself the result of evaluating a true or false expression, this is one way to do it:
* Example generated by -dataex-. To install: ssc install dataex
clear
input str6 group int year byte category double percentage
"A" 2020 1 .4
"A" 2020 1 .4
"A" 2020 2 .6
"A" 2020 2 .6
"A" 2020 2 .6
"B" 2020 1 .67
"B" 2020 1 .67
"B" 2020 2 .33
end
bysort group year category: gen num = _N
bysort group year: gen den = _N
gen prop = num / den
forval j = 1/2 {
egen cat`j' = mean(category == `j'), by(group year)
}
list
+---------------------------------------------------------------------------------+
| group year category percen~e num den prop cat1 cat2 |
|---------------------------------------------------------------------------------|
1. | A 2020 1 .4 2 5 .4 .4 .6 |
2. | A 2020 1 .4 2 5 .4 .4 .6 |
3. | A 2020 2 .6 3 5 .6 .4 .6 |
4. | A 2020 2 .6 3 5 .6 .4 .6 |
5. | A 2020 2 .6 3 5 .6 .4 .6 |
|---------------------------------------------------------------------------------|
6. | B 2020 1 .67 2 3 .6666667 .6666667 .3333333 |
7. | B 2020 1 .67 2 3 .6666667 .6666667 .3333333 |
8. | B 2020 2 .33 1 3 .3333333 .6666667 .3333333 |
+---------------------------------------------------------------------------------+
Upvotes: 2