mrdst
mrdst

Reputation: 55

Plotting date in Pandas independent of year

I have a dataframe that contains orders, and the day each order took place, allowing multiple orders per day. I've managed to plot the number of orders by day in dataframe df with:

df.groupby('order_date')['order_id'].count().plot()

This runs over several years, and what i'm interested in is plotting each year over top of each other, so that the x-axis consists of only a month and a day. My current attempt looks like this:

grouped=df.groupby([df['order_date'].map(lambda x: x.year)])
groups=[]
for name,group in grouped:
    groups.append(group)
for group in groups:
    group.groupby([group['order_date'].map(lambda x: pd.to_datetime(str(x.month)+"-"+str(x.day), format="%m-%d"))])['order_id'].count().plot() 

I group all my data by the year, then for each year I group it by a Month-Day datetime determined from its actual datetime in order_date. However, this gives me the following error:

 ValueError: Out of bounds nanosecond timestamp: 1-09-01 00:00:00

This I assume is one of my values, but I'm unsure whats actually wrong here. Is there a simpler way to do what I want, or am I making a mistake in my code?

Upvotes: 0

Views: 2220

Answers (1)

HYRY
HYRY

Reputation: 97301

I think if you want to plot each year over top of each other, the xaxis must have the same date range. To support leap year, you can shift all date to year 2000, here is my try:

import numpy as np
import pandas as pd

### create sample data
date = pd.date_range("2010-01-01", periods=365*3)
date = pd.Index(np.random.choice(date, 30000))
order_id = np.random.randint(10, 1000, size=30000)

df = pd.DataFrame({"date":date, "order_id":order_id})

### group by year and date
date = pd.Index(df["date"])
df2 = df["order_id"].groupby([date.year, date]).count()

### shift all year to 2000
date = df2.index.get_level_values(1)
new_date = pd.Index(pd.io.date_converters.parse_date_fields(np.ones(len(date))*2000, date.month, date.day))
year = df2.index.get_level_values(0)
df2.index = pd.MultiIndex.from_arrays([year, new_date])

### plot
p = df2.unstack(0).plot()
p.xaxis.set_ticklabels(range(1, 13));

output:

enter image description here

Upvotes: 2

Related Questions