blackmamba
blackmamba

Reputation: 15

Generate new variable from values per group

I have calculated in Stata the percentage observations per group, year, and category in a new variable.

group year category percentage
A 2020 1 0.4
A 2020 1 0.4
A 2020 2 0.6
A 2020 2 0.6
A 2020 2 0.6
B 2020 1 0.67
B 2020 1 0.67
B 2020 2 0.33

Now I want to have these percentages (per group, year, and category) in separate variables per category. So the results should look like this:

group year category percentage cat1 cat2
A 2020 1 0.4 0.4 0.6
A 2020 1 0.4 0.4 0.6
A 2020 2 0.6 0.4 0.6
A 2020 2 0.6 0.4 0.6
A 2020 2 0.6 0.4 0.6
B 2020 1 0.67 0.67 0.33
B 2020 1 0.67 0.67 0.33
B 2020 2 0.33 0.67 0.33

So code-wise I did the following:

bysort group year category: gen cat_obs = _N 
bysort group year : gen sum_GY_obs = _N
gen mean = cat_obs / sum_GY_obs

foreach i of num 1/2 { 
    gen cat_`i'= mean if cat==`i'
}

Upvotes: 0

Views: 567

Answers (1)

Nick Cox
Nick Cox

Reputation: 37208

I would call your calculated results proportions or fractions (adding to 1), not percentages (adding to 100). As a proportion is (or can be considered as) the mean of an indicator variable, itself the result of evaluating a true or false expression, this is one way to do it:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str6 group int year byte category double percentage
"A" 2020 1  .4
"A" 2020 1  .4
"A" 2020 2  .6
"A" 2020 2  .6
"A" 2020 2  .6
"B" 2020 1 .67
"B" 2020 1 .67
"B" 2020 2 .33
end

bysort group year category: gen num = _N
bysort group year: gen den = _N

gen prop = num / den 


forval j = 1/2 { 
    egen cat`j' = mean(category == `j'), by(group year) 
} 

list 

     +---------------------------------------------------------------------------------+
     | group   year   category   percen~e   num   den       prop       cat1       cat2 |
     |---------------------------------------------------------------------------------|
  1. |     A   2020          1         .4     2     5         .4         .4         .6 |
  2. |     A   2020          1         .4     2     5         .4         .4         .6 |
  3. |     A   2020          2         .6     3     5         .6         .4         .6 |
  4. |     A   2020          2         .6     3     5         .6         .4         .6 |
  5. |     A   2020          2         .6     3     5         .6         .4         .6 |
     |---------------------------------------------------------------------------------|
  6. |     B   2020          1        .67     2     3   .6666667   .6666667   .3333333 |
  7. |     B   2020          1        .67     2     3   .6666667   .6666667   .3333333 |
  8. |     B   2020          2        .33     1     3   .3333333   .6666667   .3333333 |
     +---------------------------------------------------------------------------------+

    

Upvotes: 2

Related Questions