Reputation: 1030
Let's say I have the following data sample:
df = pd.DataFrame({'date':['2011-01-01','2011-01-02',
'2011-01-03','2011-01-04','2011-01-05',
'2011-01-06','2011-01-07','2011-01-08',
'2011-01-09','2011-12-30','2011-12-31'],
'revenue':[5,3,2,
10,12,2,
1,0,6,10,12]})
# Let's format the date and add the week number and year
df['date'] = pd.to_datetime(df['date'],format='%Y-%m-%d')
df['week_number'] = df['date'].dt.week
df['year'] = df['date'].dt.year
df
date revenue week_of_year year
0 2011-01-01 5 52 2011
1 2011-01-02 3 52 2011
2 2011-01-03 2 1 2011
3 2011-01-04 10 1 2011
4 2011-01-05 12 1 2011
5 2011-01-06 2 1 2011
6 2011-01-07 1 1 2011
7 2011-01-08 0 1 2011
8 2011-01-09 6 1 2011
9 2011-12-30 10 52 2011
10 2011-12-31 12 52 2011
I would like to compute the revenue per week, in order to later plot the results, and analyze the time series. The expected output would then be something like that :
week revenue
0 1 8
1 2 33
2 52 22
I first thought of using the week number given by timestamp.week
.
However, I can't figure out how to deal with the ISO week number definition for the week preceeding week number 1. I am a bit confused, since grouping by week_number
would in that case sum both the revenue at the very beginning of the year, and those at the end of the year.
Upvotes: 6
Views: 14776
Reputation: 703
You could use date
column as index and then resample the time series.
df.index = pd.to_datetime(df['date'])
df.resample('W').sum()
With that solution you don't even need the week
and year
column.
Upvotes: 4
Reputation: 13437
I think that in this case you should be extremely carefull. If you want to have the weekly revenue for many years you might consider to move the first few days to the last week of 2010
import pandas as pd
import numpy as np
date = pd.date_range(start="2011-01-01", end="2011-01-09")
date = [str(d)[:10] for d in date] + ["2011-12-30", "2011-12-31"]
rev = np.random.randint(1,10, len(date))
df = pd.DataFrame({"date": date, "rev":rev})
df["date"] = df["date"].astype("M8[us]")
df["week"] = df["date"].dt.week
df["year"] = df["date"].dt.year
df["year"] = np.where((df["week"]==52) & (df["date"].dt.month==1),
df["year"]-1,
df["year"])
df.groupby(["year", "week"])["rev"].sum()
If alternatively you are happy having a first 0
week you might consider to use this instead
df["week"] = np.where((df["week"]==52) & (df["date"].dt.month==1),
0,
df["week"])
Upvotes: 0
Reputation: 323226
When you convert using dt.week , it is ISO week date.
You can using strftime
df.groupby(df.date.dt.strftime('%W')).revenue.sum()
Out[588]:
date
00 8
01 33
52 22
Name: revenue, dtype: int64
Upvotes: 11