Reputation: 2181
First off, my data set is shown below
What I'd like to do is group my columns by pickup_datetime
hour. I've found related questions on here but for some reason the solution doesn't seem to work. I've included my attemps below.
I first started off with this:
df["dropoff_datetime"] = pd.to_datetime(df["dropoff_datetime"])
df["pickup_datetime"] = pd.to_datetime(df["pickup_datetime"])
test = df.groupby(df.hour).sum()
And I got the following error:
AttributeError: 'DataFrame' object has no attribute 'hour'
Then I tried this:
test = df.groupby(df.dropoff_datetime.hour).sum()
And I got the following error:
AttributeError: 'Series' object has no attribute 'hour'
I'm a bit confused because it seems like my situation is the same as the question linked above. I'm not sure why I am getting errors though. Any help would be much appreciated
Upvotes: 3
Views: 5343
Reputation: 210862
we can use Series.dt.hour accessor:
test = df.groupby(df['pickup_datetime'].dt.hour).sum()
Here is an example describing the difference:
In [136]: times = pd.to_datetime(['2017-08-01 13:13:13', '2017-08-01 20:20:20'])
In [137]: times
Out[137]: DatetimeIndex(['2017-08-01 13:13:13', '2017-08-01 20:20:20'], dtype='datetime64[ns]', freq=None)
In [138]: type(times)
Out[138]: pandas.core.indexes.datetimes.DatetimeIndex
In [139]: times.hour
Out[139]: Int64Index([13, 20], dtype='int64')
as shown above DatetimeIndex
has "direct" .hour
accessor, but Series
of datetime
dtype has .dt.hour
accessor:
In [140]: df = pd.DataFrame({'Date': times})
In [141]: df
Out[141]:
Date
0 2017-08-01 13:13:13
1 2017-08-01 20:20:20
In [142]: type(df.Date)
Out[142]: pandas.core.series.Series
In [143]: df['Date'].dt.hour
Out[143]:
0 13
1 20
Name: Date, dtype: int64
If we set Date
column as an index:
In [146]: df.index = df['Date']
In [147]: df
Out[147]:
Date
Date
2017-08-01 13:13:13 2017-08-01 13:13:13
2017-08-01 20:20:20 2017-08-01 20:20:20
it becomes:
In [149]: type(df.index)
Out[149]: pandas.core.indexes.datetimes.DatetimeIndex
so we can access it directly (without .dt
accessor) again:
In [148]: df.index.hour
Out[148]: Int64Index([13, 20], dtype='int64', name='Date')
Upvotes: 6
Reputation: 862851
Need .dt
because working with Series
- Series.dt.hour
:
test = df.groupby(df.dropoff_datetime.dt.hour).sum()
But if DatetimeIndex
, omit it - DatetimeIndex.hour
:
test = df.groupby(df.index.hour).sum()
Upvotes: 1