How to replicated countifs functionality of excel in Pandas

Question

I'm trying to add a "Monthly orders" column which calculates how many transactions a customer with specific id had within a specific CohortDate.

Basically, it's a COUNTIFS function where RANGES are all IDS and all CohortDates equal to ID and CohortDate of any given row.

Any help is much appreciated.

import pandas as pd
import numpy as np

df = pd.DataFrame({'order_id': [75054,75057,75059,75061,75066],
                   'customer_id': [101692,101694,101734,101692,101694],
                   'CohortDate': ['2016-05','2016-05','2016-05','2016-05','2016-06'] 
                  })

The result I would aim to get is the following:

order_id    customer_id    CohortDate    Monthly_orders

75054    101692    '2016-05'    2

75057    101694    '2016-05'    1

75059    101734    '2016-05'    1

75061    101692    '2016-05'    2

75066    101694    '2016-06'    1

tohach · Accepted Answer

to group by certain variables, we can use the transform which applies the groupby accross a series rather than returning a new dataframe.

df.groupby(['customer_id','CohortDate'])['customer_id'].transform('count')

this returns the count across the original data frame.

order_id    customer_id CohortDate  count
0   75054   101692  2016-05 2
1   75057   101694  2016-05 1
2   75059   101734  2016-05 1
3   75061   101692  2016-05 2
4   75066   101694  2016-06 1

How to replicated countifs functionality of excel in Pandas

Answers (1)

Related Questions