Ji Xu
Ji Xu

Reputation: 1

Generate sum variable for subgroups

I have here a Steam Dataset which includes individual steam user and their playtimes(overall) and the games they played. I further divided the player in hardcore(=1) and casual player (=0). Overall I want to test how various factors have influence on the overall playtime of the players, but now I want to build 2 regressions, one for hardcore players and one for casual players(because I think that the effect of every factor can differ between those two). But in order to do that, I need the sum of the overall playtime from the 2 subgroups. I tried egen playtime_type = sum(playtime_sum), by (hightype), but the outcome just doesn't make sense. How can I aggregate the sum of playtime only for each subgroup?

Here is a example from the dataset

steamid playtime_sum    hightype
76561197960265729   0   0
76561197960265730   45  0
76561197960265730   45  0
76561197960265730   45  0
76561197960265733   1710    0
76561197960265733   1710    0
76561197960265733   1710    0
76561197960265733   1710    0
76561197960265733   1710    0
76561197960265738   11  0
76561197960265738   11  0
76561197960265738   11  0

Upvotes: 0

Views: 1887

Answers (1)

TheIceBear
TheIceBear

Reputation: 3255

What makes sense and what does not make sense is highly subjective and it is always better if you explain what the output was and what you had expected instead.

My guess is that you want to use total() instead of sum().

egen playtime_type = total(playtime_sum), by(hightype)

Upvotes: 1

Related Questions