Reputation: 52500
I want to find the sizes of the groups that have at least one row with 0.5 < C < 1.0
. Given a dataframe like this:
A | B | C |
---|---|---|
1 | 2 | 0.1 |
1 | 2 | 0.9 |
1 | 2 | 1.0 |
2 | 5 | 0 |
2 | 5 | 0.1 |
2 | 5 | 0.2 |
3 | 4 | 0.6 |
I'd like to see something like the following returned:
A | B | Size |
---|---|---|
1 | 2 | 3 |
3 | 4 | 1 |
I've tried the following:
group = dataset.groupby(['A', 'B'])
filtered = group.filter(lambda x: 0.5 < x['C'] < 1.0)
filtered.size()
However, I get this error on the second line:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
The any()
function makes sense in this context as I want any value for C
to be between 0.5 and 1.0 in order to count that group, but I don't know where to put the any()
call. I tried calling it on the lambda. I tried after filter()
. Nothing I try works...
Upvotes: 3
Views: 700
Reputation: 323226
We can just do sum
out = df.C.between(.5,1).groupby([df['A'],df['B']]).sum().loc[lambda x : x!=0]
Out[180]:
A B
1 2 2
3 4 1
Name: C, dtype: int64
Upvotes: 1
Reputation: 8219
use any
in the boolean indexing of the groupby
:
df.groupby(['A','B']).size()[df.groupby(['A','B']).apply(lambda g:((g['C'] > 0.5) & (g['C'] < 1.0)).any())]
prints
A B
1 2 3
3 4 1
dtype: int64
Upvotes: 1
Reputation: 195408
df["tmp"] = (df["C"] > 0.5) & (df["C"] < 1.0)
g = df.groupby(["A", "B"])
print(g.size()[g["tmp"].sum() > 0])
Prints:
A B
1 2 3
3 4 1
dtype: int64
Upvotes: 4
Reputation: 3629
df.groupby(['A', 'B']).agg(
{'C': lambda x: len(x) if np.logical_and(0.5 < x.values, x.values < 1.).sum() else None}
).dropna().astype(int).rename(columns={'C': 'size'})
size
A B
1 2 3
3 4 1
Or if you prefer A
and B
as columns:
df.groupby(['A', 'B']).agg(
{'C': lambda x: len(x) if np.logical_and(0.5 < x.values, x.values < 1.).sum() else None}
).dropna().astype(int).rename(columns={'C': 'size'}).reset_index()
A B size
0 1 2 3
1 3 4 1
Upvotes: 0