Lucien S.
Lucien S.

Reputation: 5345

Exclude low sample counts from Pandas' "groupby" calculations

Using Pandas, I'd like to "groupby" and calculate the mean values for each group of my Dataframe. I do it like this:

dict = {
    "group": ["A", "B", "C", "A", "A", "B", "B", "C", "A"],
    "value": [5, 6, 8, 7, 3, 9, 4, 6, 5]
}
import pandas as pd
df = pd.DataFrame(dict)
print(df)
g = df.groupby([df['group']]).mean()
print(g)

Which gives me:

          value
group          
A      5.000000
B      6.333333
C      7.000000

However, I'd like to exclude groups which have, let's say, less than 3 entries (so that the mean has somewhat of a value). In this case, it would exclude group "C" from the results. How can I implement this?

Upvotes: 2

Views: 180

Answers (1)

Nk03
Nk03

Reputation: 14949

Filter the group based on the length and then take the mean.

df = df.groupby('group').filter(lambda x : len(x) > 5).mean()

#if you want the mean group-wise after filtering the required groups
result = df.groupby('group').filter(lambda x : len(x) >= 3).groupby('group').mean().reset_index()

Output:

  group     value
0     A  5.000000
1     B  6.333333

Upvotes: 4

Related Questions