Dinosaurius
Dinosaurius

Reputation: 8628

Group data by frequency and estimate % per groups

I have this DataFrame df:

ID  EVAL
11  1
11  0
22  0
11  1
33  0
44  0
22  1
11  1

I need to estimate the % of rows with EVAL equal to 1 and 0 for two groups: Group 1 contains those IDs that are repeated more than or equal to 3 times in df. Group 2 contains IDs that are repeated less than 3 times in df.

The result should be this one:

GROUP    EVAL_0    EVAL_1       
1        25        75
2        75        25

Upvotes: 0

Views: 91

Answers (1)

miradulo
miradulo

Reputation: 29710

You can get the percentage of IDs that are repeated three or more times with value_counts() and then using a boolean index with mean.

>>> (df.ID.value_counts() >= 3).mean()
0.25

This is the gist of the work, but depending on what you wanted to do with it, if you wanted output like yours you could just create a DataFrame

>>> g1_perc = (df.ID.value_counts() >= 3).mean()
>>> pd.DataFrame(dict(group=[1, 2], perc_group=[g1_perc*100, (1-g1_perc)*100]))
   group  perc_group
0      1        25.0
1      2        75.0

The second column with the opposite percentage looks a bit needless to me.

Upvotes: 1

Related Questions