Dasphillipbrau
Dasphillipbrau

Reputation: 582

Getting a unique count within a range of months in a Pandas DataFrame

I have a Dataframe which contains customer names, amount of orders and the date in which they ordered.

I want to know, how many customers I had in a range of months. So, count the unique customer names within June to October, to put an example.

I have tried Cust_per_month = raw_data[['Customer']].groupby(raw_data.PDate.dt.month).nunique()

But this returns a series with counts for each individual month, whereas I need to know in ranges, from June to October, then from June to December.

I was thinking about creating a condition where it would only count a customer if any of the integers related to the months I'm interested in would show up, but this seems pretty clunky in my book.

Upvotes: 1

Views: 771

Answers (2)

MkWTF
MkWTF

Reputation: 1372

If you need a more generic approach, you can use something like this:

# Create list of ranges you want to see (it can also be a dataframe, using
# the df.iterrows function in the next step
month_range_list = [
    {"name": "June - October",
     "lower": 6,
     "upper": 10},
    {"name": "June - December",
     "lower": 6,
     "upper": 12}
]

# Expand the list into all elements within the range you defined (you can also use
# dates, and then expand them with datetime.timedelta)
range_df = pd.concat([ 
    pd.DataFrame({ "name": [ p["name"] ], "month": [ month ] })
    for p in month_range_list
    for month in range(p["lower"], p["upper"]+1)
])

# Merge the range_df with your raw_data, and calculate for each grouping the 
# number of unique customers
raw_data \
    .assign(month=raw_data.calendar_date.dt.month) \
    .merge(range_df, on="month") \
    .groupby("name") \
    ["Customer"] \
    .nunique()

Upvotes: 1

ALollz
ALollz

Reputation: 59549

I would mask the original DataFrame then calculate. groupby is more useful with unique, non-overlapping groups, or with a fixed window (groupby.rolling), neither of which are applicable here.

Sample Data

import string
import pandas
import numpy

np.random.seed(42)
raw_data = pd.DataFrame({'PDate': pd.date_range('2010-01-01', freq='45D', periods=50),
                         'Customer': np.random.choice(list(string.ascii_lowercase), 50)})

Code

m1 = raw_data.PDate.dt.month.between(6, 10, inclusive=True) # [June, October]
m2 = raw_data.PDate.dt.month.between(6, 12, inclusive=True) # [June, December]

raw_data[m1].Customer.nunique()
# 14

raw_data[m2].Customer.nunique()
# 17

Upvotes: 1

Related Questions