Oliver Brace
Oliver Brace

Reputation: 470

How to groupby week starting at a particular time

I have data that I wish to groupby week.

I have been able to do this using the following

Data_Frame.groupby([pd.Grouper(freq='W')]).count()

this creates a dataframe in the form of

2018-01-07  ...

2018-01-14  ...

2018-01-21  ...

which is great. However I need it to start at 06:00, so something like

2018-01-07 06:00:00 ...

2018-01-14 06:00:00 ...

2018-01-21 06:00:00 ...

I am aware that I could shift my data by 6 hours but this seems like a cheat and I'm pretty sure Grouper comes with the functionality to do this (some way of specifying when it should start grouping).

I was hoping someone who know of a good method of doing this.

Many Thanks

edit:

I'm trying to use pythons actual in built functionality more since it often works much better and more consistently. I also turn the data itself into a graph with the timestamps as the y column and I would want the timestamp to actuality reflect the data, without some method such as shifting everything by 6 hours grouping it and then reshifting everything back 6 hours to get the right timestamp .

Upvotes: 4

Views: 419

Answers (3)

Oliver Brace
Oliver Brace

Reputation: 470

So to solve this one needs to use the base parameter for Grouper.

However the caveat is that whatever time period used (years, months, days etc..) for Freq, base will also be in it (from what I can tell).

So as I want to displace the starting position by 6 hours then my freq needs to be in hours rather than weeks (i.e. 1W = 168H).

So the solution I was looking for was

Data_Frame.groupby([pd.Grouper(freq='168H', base = 6)]).count()

This is simple, short, quick and works exactly as I want it to.

Thanks to all the other answers though

Upvotes: 1

jezrael
jezrael

Reputation: 863541

Use double shift:

np.random.seed(456)

idx = pd.date_range(start = '2018-01-07', end = '2018-01-09', freq = '2H')
df = pd.DataFrame({'a':np.random.randint(10, size=25)}, index=idx)
print (df)
                     a
2018-01-07 00:00:00  5
2018-01-07 02:00:00  9
2018-01-07 04:00:00  4
2018-01-07 06:00:00  5
2018-01-07 08:00:00  7
2018-01-07 10:00:00  1
2018-01-07 12:00:00  8
2018-01-07 14:00:00  3
2018-01-07 16:00:00  5
2018-01-07 18:00:00  2
2018-01-07 20:00:00  4
2018-01-07 22:00:00  2
2018-01-08 00:00:00  2
2018-01-08 02:00:00  8
2018-01-08 04:00:00  4
2018-01-08 06:00:00  8
2018-01-08 08:00:00  5
2018-01-08 10:00:00  6
2018-01-08 12:00:00  0
2018-01-08 14:00:00  9
2018-01-08 16:00:00  8
2018-01-08 18:00:00  2
2018-01-08 20:00:00  3
2018-01-08 22:00:00  6
2018-01-09 00:00:00  7

#freq='D' for easy check, in original use `W`
df1 = df.shift(-6, freq='H').groupby([pd.Grouper(freq='D')]).count().shift(6, freq='H')
print (df1)
                      a
2018-01-06 06:00:00   3
2018-01-07 06:00:00  12
2018-01-08 06:00:00  10

Upvotes: 1

user3256363
user3256363

Reputation: 153

I would create another column with the required dates, and groupby on them

import pandas as pd
import numpy as np

 selected_datetime = pd.date_range(start = '2018-01-07', end = '2018-01-30', freq = '1H')
 df = pd.DataFrame(selected_datetime, columns = ['date'])
 df['value1'] = np.random.rand(df.shape[0])
# specify the condition for your date, eg. starting from 6am
df['shift1'] = df['date'].apply(lambda x: x.date() if x.hour == 6 else np.nan)
# forward fill the na values to have last date
df['shift1'] = df['shift1'].fillna(method = 'ffill')
# you can groupby on this col
df.groupby('shift1')['value1'].mean()

Upvotes: 0

Related Questions