Reputation: 1
I have a categorical variable and am trying to calculate a new variable that multiplies each response by its frequency. Ex:
total | Freq.
------------+---------------
1 | 6
2 | 12
3 | 9
5 | 5
6 | 10
I would like to have a variable that presents the sum n for each response (i.e. 1=6, 2=24, 3=27, etc.). I tried a few calculations using egen, but they did not seem to work. Please let me know if anyone has any insight.
Upvotes: 0
Views: 79
Reputation: 9470
It's not clear whether you want to have the data in the original dataset or you want a new one. This code does both:
clear
input catvar n
1 6
2 12
3 9
5 5
6 10
end
/* create fake catvar data */
expand n
drop n
/* store desired data in a variable in your data */
bysort catvar: gen sum = _N
replace sum = sum*catvar
list in 1/6, clean noobs
table catvar, c(mean sum freq)
/* or get a new dataset with desired data */
contract catvar sum, freq(n)
list, clean noobs
Upvotes: 2
Reputation: 887
I think that this example should show you the general tactic:
sysuse auto, clear
bysort rep78: egen count_rep78 = count(rep78)
gen freq_x_val = rep78*count_rep78
browse rep78 count_rep78 freq_x_val
In this example rep78 is the categorical variable.
Essentially, you create a count variable that is the category's frequency in the bysort
step. Then you multiply your new count variable by the categorical variable and you're done.
Upvotes: 2