at.
at.

Reputation: 52500

Pandas groupby where one of the group values is in a range

I want to find the sizes of the groups that have at least one row with 0.5 < C < 1.0. Given a dataframe like this:

A B C
1 2 0.1
1 2 0.9
1 2 1.0
2 5 0
2 5 0.1
2 5 0.2
3 4 0.6

I'd like to see something like the following returned:

A B Size
1 2 3
3 4 1

I've tried the following:

group = dataset.groupby(['A', 'B'])
filtered = group.filter(lambda x: 0.5 < x['C'] < 1.0)
filtered.size()

However, I get this error on the second line:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

The any() function makes sense in this context as I want any value for C to be between 0.5 and 1.0 in order to count that group, but I don't know where to put the any() call. I tried calling it on the lambda. I tried after filter(). Nothing I try works...

Upvotes: 3

Views: 700

Answers (4)

BENY
BENY

Reputation: 323226

We can just do sum

out = df.C.between(.5,1).groupby([df['A'],df['B']]).sum().loc[lambda x : x!=0]
Out[180]: 
A  B
1  2    2
3  4    1
Name: C, dtype: int64

Upvotes: 1

piterbarg
piterbarg

Reputation: 8219

use any in the boolean indexing of the groupby:

df.groupby(['A','B']).size()[df.groupby(['A','B']).apply(lambda g:((g['C'] > 0.5) & (g['C'] < 1.0)).any())]

prints

A  B
1  2    3
3  4    1
dtype: int64

Upvotes: 1

Andrej Kesely
Andrej Kesely

Reputation: 195408

df["tmp"] = (df["C"] > 0.5) & (df["C"] < 1.0)
g = df.groupby(["A", "B"])
print(g.size()[g["tmp"].sum() > 0])

Prints:

A  B
1  2    3
3  4    1
dtype: int64

Upvotes: 4

PieCot
PieCot

Reputation: 3629

df.groupby(['A', 'B']).agg(
    {'C': lambda x: len(x) if np.logical_and(0.5 < x.values,  x.values < 1.).sum() else None}
).dropna().astype(int).rename(columns={'C': 'size'})

     size
A B      
1 2     3
3 4     1

Or if you prefer A and B as columns:

df.groupby(['A', 'B']).agg(
    {'C': lambda x: len(x) if np.logical_and(0.5 < x.values,  x.values < 1.).sum() else None}
).dropna().astype(int).rename(columns={'C': 'size'}).reset_index()

   A  B  size
0  1  2     3
1  3  4     1

Upvotes: 0

Related Questions