Reputation: 582
I have a Dataframe which contains customer names, amount of orders and the date in which they ordered.
I want to know, how many customers I had in a range of months. So, count the unique customer names within June to October, to put an example.
I have tried Cust_per_month = raw_data[['Customer']].groupby(raw_data.PDate.dt.month).nunique()
But this returns a series with counts for each individual month, whereas I need to know in ranges, from June to October, then from June to December.
I was thinking about creating a condition where it would only count a customer if any of the integers related to the months I'm interested in would show up, but this seems pretty clunky in my book.
Upvotes: 1
Views: 771
Reputation: 1372
If you need a more generic approach, you can use something like this:
# Create list of ranges you want to see (it can also be a dataframe, using
# the df.iterrows function in the next step
month_range_list = [
{"name": "June - October",
"lower": 6,
"upper": 10},
{"name": "June - December",
"lower": 6,
"upper": 12}
]
# Expand the list into all elements within the range you defined (you can also use
# dates, and then expand them with datetime.timedelta)
range_df = pd.concat([
pd.DataFrame({ "name": [ p["name"] ], "month": [ month ] })
for p in month_range_list
for month in range(p["lower"], p["upper"]+1)
])
# Merge the range_df with your raw_data, and calculate for each grouping the
# number of unique customers
raw_data \
.assign(month=raw_data.calendar_date.dt.month) \
.merge(range_df, on="month") \
.groupby("name") \
["Customer"] \
.nunique()
Upvotes: 1
Reputation: 59549
I would mask
the original DataFrame then calculate. groupby
is more useful with unique, non-overlapping groups, or with a fixed window (groupby.rolling
), neither of which are applicable here.
import string
import pandas
import numpy
np.random.seed(42)
raw_data = pd.DataFrame({'PDate': pd.date_range('2010-01-01', freq='45D', periods=50),
'Customer': np.random.choice(list(string.ascii_lowercase), 50)})
m1 = raw_data.PDate.dt.month.between(6, 10, inclusive=True) # [June, October]
m2 = raw_data.PDate.dt.month.between(6, 12, inclusive=True) # [June, December]
raw_data[m1].Customer.nunique()
# 14
raw_data[m2].Customer.nunique()
# 17
Upvotes: 1