David Chouinard
David Chouinard

Reputation: 6836

Grouping Pandas DataFrame by date

I have a Pandas DataFrame that includes a date column. Elements of that column are of type pandas.tslib.Timestamp.

I'd like to group the dataframe by date, but exclude timestamp information that is more granular that date (ie. grouping by date, where all Feb 23, 2011 are grouped). I know how to express this in SQL, but am quite new to Pandas.

This question does something very similar, but I don't understand the code and it uses datetime objects.

From the documentation, I don't even understand how to retrieve the date from a Pandas Timestamp object. I could convert to datetime object, but that seems very roundabout.


As requested, the output of df.head():

    date    show    network timed   session_id
0   2011-12-03 02:48:52  Monk    TV38    670     00003DA9-01D2-E7A9-4177-203BE6A9E2BA    
1   2011-12-03 03:00:09  WBZ News    TV38    205     00003DA9-01D2-E7A9-4177-203BE6A9E2BA
2   2011-12-03 03:04:04  Dateline NBC    NBC     30  00003DA9-01D2-E7A9-4177-203BE6A9E2BA
3   2011-12-03 03:04:35  20/20   ABC     25  00003DA9-01D2-E7A9-4177-203BE6A9E2BA
4   2011-12-03 03:04:56  College Football    FOX     55  00003DA9-01D2-E7A9-4177-203BE6A9E2BA

Upvotes: 10

Views: 16544

Answers (2)

Andy Hayden
Andy Hayden

Reputation: 375377

You can use the normalize DatetimeIndex method (which takes it to midnight that day):

In [11]: df['date']
Out[11]: 
0   2011-12-03 02:48:52
1   2011-12-03 03:00:09
2   2011-12-03 03:04:04
3   2011-12-03 03:04:35
4   2011-12-03 03:04:56
Name: date, dtype: datetime64[ns]

In [12]: pd.DatetimeIndex(df['date']).normalize()
Out[12]: 
<class 'pandas.tseries.index.DatetimeIndex'>
[2011-12-03 00:00:00, ..., 2011-12-03 00:00:00]
Length: 5, Freq: None, Timezone: None

And you can groupby this:

g = df.groupby(pd.DatetimeIndex(df['date']).normalize())

In 0.15 you'll have access to the dt attribute, so can write this as:

g = df.groupby(df['date'].dt.normalize())

Upvotes: 21

Patrick Russell
Patrick Russell

Reputation: 73

It's not clear if you are trying to groupby and aggregate (like in SQL) or create an index with a date instead of a timestamp.

If you're trying to groupby and aggregrate, you could do it this way:

df.groupby(df.set_index('date').index.date).mean()

Timeseries indexes have datetime properites like day, date, etc. That will aggregate the timed column since it's the only numeric column.

If you're trying to create an index with a date level, you could do something like:

import datetime
df.set_index(['date', df.date.apply(lambda x: datetime.datetime.date(x))], inplace=True)
df.index.names = ['timestamp', 'daydate']

This will give you a multiindex with the timestamp and a date . If you don't want the index to be permanent, drop the inplace= argument.

Upvotes: 6

Related Questions