plalanne
plalanne

Reputation: 1030

Pandas groupby week given a datetime column

Let's say I have the following data sample:

df = pd.DataFrame({'date':['2011-01-01','2011-01-02',
                       '2011-01-03','2011-01-04','2011-01-05',
                       '2011-01-06','2011-01-07','2011-01-08',
                       '2011-01-09','2011-12-30','2011-12-31'],
                   'revenue':[5,3,2,
                              10,12,2,
                              1,0,6,10,12]})

# Let's format the date and add the week number and year
df['date'] = pd.to_datetime(df['date'],format='%Y-%m-%d')
df['week_number'] = df['date'].dt.week
df['year'] = df['date'].dt.year

df

        date        revenue     week_of_year    year
0       2011-01-01  5           52              2011
1       2011-01-02  3           52              2011
2       2011-01-03  2           1               2011
3       2011-01-04  10          1               2011
4       2011-01-05  12          1               2011
5       2011-01-06  2           1               2011
6       2011-01-07  1           1               2011
7       2011-01-08  0           1               2011
8       2011-01-09  6           1               2011
9       2011-12-30  10          52              2011
10      2011-12-31  12          52              2011

I would like to compute the revenue per week, in order to later plot the results, and analyze the time series. The expected output would then be something like that :

    week    revenue
0   1       8
1   2       33
2   52      22

I first thought of using the week number given by timestamp.week.
However, I can't figure out how to deal with the ISO week number definition for the week preceeding week number 1. I am a bit confused, since grouping by week_number would in that case sum both the revenue at the very beginning of the year, and those at the end of the year.

Upvotes: 6

Views: 14776

Answers (3)

Alex G
Alex G

Reputation: 703

You could use date column as index and then resample the time series.

df.index = pd.to_datetime(df['date'])
df.resample('W').sum()

With that solution you don't even need the week and year column.

Upvotes: 4

rpanai
rpanai

Reputation: 13437

I think that in this case you should be extremely carefull. If you want to have the weekly revenue for many years you might consider to move the first few days to the last week of 2010

import pandas as pd
import numpy as np

date =  pd.date_range(start="2011-01-01", end="2011-01-09")
date = [str(d)[:10] for d in date] + ["2011-12-30", "2011-12-31"]
rev =  np.random.randint(1,10, len(date))
df =  pd.DataFrame({"date": date, "rev":rev})
df["date"] =  df["date"].astype("M8[us]")

df["week"] = df["date"].dt.week
df["year"] = df["date"].dt.year
df["year"] = np.where((df["week"]==52) & (df["date"].dt.month==1), 
                      df["year"]-1,
                      df["year"])

df.groupby(["year", "week"])["rev"].sum()

If alternatively you are happy having a first 0 week you might consider to use this instead

df["week"] = np.where((df["week"]==52) & (df["date"].dt.month==1),
                      0, 
                      df["week"])

Upvotes: 0

BENY
BENY

Reputation: 323226

When you convert using dt.week , it is ISO week date.

You can using strftime

df.groupby(df.date.dt.strftime('%W')).revenue.sum()
Out[588]: 
date
00     8
01    33
52    22
Name: revenue, dtype: int64

Upvotes: 11

Related Questions