Reputation: 5345
Using Pandas, I'd like to "groupby" and calculate the mean values for each group of my Dataframe. I do it like this:
dict = {
"group": ["A", "B", "C", "A", "A", "B", "B", "C", "A"],
"value": [5, 6, 8, 7, 3, 9, 4, 6, 5]
}
import pandas as pd
df = pd.DataFrame(dict)
print(df)
g = df.groupby([df['group']]).mean()
print(g)
Which gives me:
value
group
A 5.000000
B 6.333333
C 7.000000
However, I'd like to exclude groups which have, let's say, less than 3 entries (so that the mean has somewhat of a value). In this case, it would exclude group "C" from the results. How can I implement this?
Upvotes: 2
Views: 180
Reputation: 14949
Filter the group based on the length and then take the mean.
df = df.groupby('group').filter(lambda x : len(x) > 5).mean()
#if you want the mean group-wise after filtering the required groups
result = df.groupby('group').filter(lambda x : len(x) >= 3).groupby('group').mean().reset_index()
Output:
group value
0 A 5.000000
1 B 6.333333
Upvotes: 4