lifetea
lifetea

Reputation: 27

How to find if the most common value of a column appears more than X% times?

Consider the following dataframe:

  ID   Column
0 500  2
1 500  2
2 500  2
3 500  2
4 500  2
5 500  4

How can we see if the most common value of 'Column' appears more than X% of the times?

I've tried to do: df.locate[df.groupby('ID')['Column'].count_values(normalize=True).max() > X] , but I get an error.

Upvotes: 1

Views: 38

Answers (1)

Sander van den Oord
Sander van den Oord

Reputation: 12808

I think what you had was close to a solution. It's not really clear to me, if you want to calculate this just over the whole column, or per group, so here's a solution for both. You can change variable at_least_this_proportion, to change the minimum threshold:

import pandas as pd
from io import StringIO

text = """
  ID   Column
0 500  2
1 500  2
2 500  2
3 500  2
4 500  2
5 500  4
6 501  2
7 501  2
"""

df = pd.read_csv(StringIO(text), header=0, sep='\s+')

# set minimum threshold
at_least_this_proportion = 0.5

Calculate per group:

# find the value that occurs at least 50% within its group
value_counts_per_group = df.groupby('ID')['Column'].value_counts(normalize=True)
ids_that_meet_threshold = value_counts_per_group[value_counts_per_group > at_least_this_proportion].index.get_level_values(0)

# get all rows for which the id meets the threshold
df[df['ID'].isin(ids_that_meet_threshold)]

Upvotes: 1

Related Questions