Alexandre_K
Alexandre_K

Reputation: 81

Perform Excel COUNTIFS in Pandas

Issue

I'm trying to do an Excel count if (count with conditions) in Pandas.

In the example bellow, I try to answer "how many cond == 1 for each couple key1 x key2?"

As output I would like to have 2 df:

I found a way to do it but I need to use a temporary step (step 1). I would like to know if there is a better way to perform this calculation

Data

import pandas as pd
d = {'key1': list('AAAACB'),
    'key2': list('XYXXXX'),
    'cond': [1, 2, 1, 2, 1, 2]}
df = pd.DataFrame(data=d)
df
#   key1 key2  cond
# 0    A    X     1
# 1    A    Y     2
# 2    A    X     1
# 3    A    X     2
# 4    C    X     1
# 5    B    X     2

Calculation

Step 1: temporay step to calculate cond at line level

import numpy as np

df.loc[:, "count_cond_1"] = np.where(df["cond"]==1, 1, 0)
df
#   key1 key2  cond  count_cond_1
# 0    A    X     1             1
# 1    A    Y     2             0
# 2    A    X     1             1
# 3    A    X     2             0
# 4    C    X     1             1
# 5    B    X     2             0

Step 2 : count if in a groupby df

pivot_key = ["key1", "key2"]

grpb = df.groupby(pivot_key, dropna=False)
df_grp = grpb.agg({"count_cond_1": "sum"}).reset_index()
df_grp
#   key1 key2  count_cond_1
# 0    A    X             2
# 1    A    Y             0
# 2    B    X             0
# 3    C    X             1

Step 2 bis: count if in the initial df

pivot_key=["key1", "key2"]
grpb = df.groupby(pivot_key, dropna=False)

df.loc[:, "count_cond_1"] = grpb["count_cond_1"].transform(sum)
df
#   key1 key2  cond  count_cond_1
# 0    A    X     1             2
# 1    A    Y     2             0
# 2    A    X     1             2
# 3    A    X     2             2
# 4    C    X     1             1
# 5    B    X     2             0

Upvotes: 1

Views: 185

Answers (1)

mozway
mozway

Reputation: 260300

You can achieve the whole thing in one step with groupby.transform:

df['count_cond_1'] = (df['cond'].eq(1)
                      .groupby([df['key1'], df['key2']])
                      .transform('sum')
                      )

output:

  key1 key2  cond  count_cond_1
0    A    X     1             2
1    A    Y     2             0
2    A    X     1             2
3    A    X     2             2
4    C    X     1             1
5    B    X     2             0

If you need the intermediate df_grp:

df_grp = (df['cond'].eq(1)
          .groupby([df['key1'], df['key2']])
          .agg(count_cond_1='sum').reset_index()
          )

df = df.merge(df_grp)

output:

# df_grp
  key1 key2  count_cond_1
0    A    X             2
1    A    Y             0
2    B    X             0
3    C    X             1

# df
  key1 key2  cond  count_cond_1
0    A    X     1             2
1    A    X     1             2
2    A    X     2             2
3    A    Y     2             0
4    C    X     1             1
5    B    X     2             0

Upvotes: 2

Related Questions