Alex
Alex

Reputation: 1391

select all elements in a group that meet a specific condition using pandas

I have a df that's grouped by id. For each id group, I would like to return all the rows where column a is greater than the mean of a for that group. I tried the following:

df = pd.DataFrame(data = {"a": np.arange(10), "b": np.arange(10)[::-1], "c": np.random.choice(a = np.arange(10), size = 10)}, index = pd.Index(data = np.random.choice(a = [1,2,3], size = 10), name = "id"))
df.groupby("id").apply(lambda x: x[x.a > x.a.mean()])

This raises a ValueError: Duplicated level name: "id", assigned to level 1, is already used for level 0.

What am I doing wrong?

Upvotes: 4

Views: 3288

Answers (1)

jezrael
jezrael

Reputation: 863166

Use transform for same Series as size of original DataFrame for better performance like apply solution:

df = df[df['a'] > df.groupby("id")['a'].transform('mean')]
print (df)
    a  b  c
id         
2   5  4  3
2   6  3  2
1   7  2  3
3   8  1  0
3   9  0  5

Detail:

print (df.groupby("id")['a'].transform('mean'))
id
1    4.75
1    4.75
1    4.75
3    3.50
3    3.50
1    4.75
1    4.75
1    4.75
1    4.75
1    4.75
Name: a, dtype: float64

In your solution need parameter group_keys=False for avoid MultiIndex with same levels names, because id in index name:

df = df.groupby("id", group_keys=False).apply(lambda x: x[x.a > x.a.mean()])

If first reset_index() get column name id and index name id, but there are same values:

df = df.reset_index().groupby("id").apply(lambda x: x[x.a > x.a.mean()])
print (df)
      id  a  b  c
id               
2  6   2  6  3  3
   7   2  7  2  9
   9   2  9  0  1
3  5   3  5  4  9
   8   3  8  1  8

Another test - removing index name - id:

df = df.rename_axis(None)
print (df)
   a  b  c
3  0  9  2
2  1  8  2
1  2  7  6
3  3  6  1
1  4  5  3
2  5  4  9
3  6  3  6
2  7  2  1
1  8  1  0
1  9  0  1

df = df.groupby(level=0).apply(lambda x: x[x.a > x.a.mean()])
print (df)
     a  b  c
1 1  8  1  0
  1  9  0  1
2 2  5  4  9
  2  7  2  1
3 3  6  3  6

Upvotes: 3

Related Questions