Reputation: 8628
I have this DataFrame df
:
ID EVAL
11 1
11 0
22 0
11 1
33 0
44 0
22 1
11 1
I need to estimate the % of rows with EVAL
equal to 1 and 0 for two groups: Group 1 contains those IDs
that are repeated more than or equal to 3 times in df
. Group 2 contains IDs
that are repeated less than 3 times in df
.
The result should be this one:
GROUP EVAL_0 EVAL_1
1 25 75
2 75 25
Upvotes: 0
Views: 91
Reputation: 29710
You can get the percentage of IDs that are repeated three or more times with value_counts()
and then using a boolean index with mean
.
>>> (df.ID.value_counts() >= 3).mean()
0.25
This is the gist of the work, but depending on what you wanted to do with it, if you wanted output like yours you could just create a DataFrame
>>> g1_perc = (df.ID.value_counts() >= 3).mean()
>>> pd.DataFrame(dict(group=[1, 2], perc_group=[g1_perc*100, (1-g1_perc)*100]))
group perc_group
0 1 25.0
1 2 75.0
The second column with the opposite percentage looks a bit needless to me.
Upvotes: 1