Reputation: 81
I'm trying to do an Excel count if (count with conditions) in Pandas.
In the example bellow, I try to answer "how many cond == 1
for each couple key1 x key2
?"
As output I would like to have 2 df:
I found a way to do it but I need to use a temporary step (step 1). I would like to know if there is a better way to perform this calculation
import pandas as pd
d = {'key1': list('AAAACB'),
'key2': list('XYXXXX'),
'cond': [1, 2, 1, 2, 1, 2]}
df = pd.DataFrame(data=d)
df
# key1 key2 cond
# 0 A X 1
# 1 A Y 2
# 2 A X 1
# 3 A X 2
# 4 C X 1
# 5 B X 2
import numpy as np
df.loc[:, "count_cond_1"] = np.where(df["cond"]==1, 1, 0)
df
# key1 key2 cond count_cond_1
# 0 A X 1 1
# 1 A Y 2 0
# 2 A X 1 1
# 3 A X 2 0
# 4 C X 1 1
# 5 B X 2 0
pivot_key = ["key1", "key2"]
grpb = df.groupby(pivot_key, dropna=False)
df_grp = grpb.agg({"count_cond_1": "sum"}).reset_index()
df_grp
# key1 key2 count_cond_1
# 0 A X 2
# 1 A Y 0
# 2 B X 0
# 3 C X 1
pivot_key=["key1", "key2"]
grpb = df.groupby(pivot_key, dropna=False)
df.loc[:, "count_cond_1"] = grpb["count_cond_1"].transform(sum)
df
# key1 key2 cond count_cond_1
# 0 A X 1 2
# 1 A Y 2 0
# 2 A X 1 2
# 3 A X 2 2
# 4 C X 1 1
# 5 B X 2 0
Upvotes: 1
Views: 185
Reputation: 260300
You can achieve the whole thing in one step with groupby.transform
:
df['count_cond_1'] = (df['cond'].eq(1)
.groupby([df['key1'], df['key2']])
.transform('sum')
)
output:
key1 key2 cond count_cond_1
0 A X 1 2
1 A Y 2 0
2 A X 1 2
3 A X 2 2
4 C X 1 1
5 B X 2 0
If you need the intermediate df_grp
:
df_grp = (df['cond'].eq(1)
.groupby([df['key1'], df['key2']])
.agg(count_cond_1='sum').reset_index()
)
df = df.merge(df_grp)
output:
# df_grp
key1 key2 count_cond_1
0 A X 2
1 A Y 0
2 B X 0
3 C X 1
# df
key1 key2 cond count_cond_1
0 A X 1 2
1 A X 1 2
2 A X 2 2
3 A Y 2 0
4 C X 1 1
5 B X 2 0
Upvotes: 2