Reputation: 33
I have a python dataframe (df1
) which has a column time. I converted the column into a datetime series using pd.to_datetime(df1['time'])
. Now I get a column like this:
2016-08-24 00:00:00 2016-08-13 00:00:00
2016-08-24 00:00:00 2016-08-13 00:00:00
2016-08-24 00:00:00 2016-08-13 00:00:00
2016-08-24 00:00:00 2016-08-13 00:00:00
2016-08-24 00:00:01 2016-08-13 00:00:01
2016-08-24 00:00:01 2016-08-13 00:00:01
2016-08-24 00:00:02 2016-08-13 00:00:02
2016-08-24 00:00:02 2016-08-13 00:00:02
2016-08-24 00:00:02 2016-08-13 00:00:02
2016-08-24 00:00:02 2016-08-13 00:00:02
2016-08-24 00:00:02 2016-08-13 00:00:02
2016-08-24 00:00:02 2016-08-13 00:00:02
2016-08-24 00:00:02 2016-08-13 00:00:02
2016-08-24 00:00:02 2016-08-13 00:00:02
2016-08-24 00:00:02 2016-08-13 00:00:02
....
2016-08-24 23:59:59 2016-08-13 00:00:02
Essentially, I want the first column to be grouped by the hour, so that I can see how many entries are there in 1 hour. Any help will be great.
Upvotes: 3
Views: 641
Reputation: 862511
Use resample
:
#pandas version 0.18.0 and higher
df = df.resample('H').size()
#pandas version below 0.18.0
#df = df.resample('H', 'size')
print (df)
2016-08-24 00:00:00 1
2016-08-24 01:00:00 3
2016-08-24 02:00:00 1
Freq: H, dtype: int64
If need output as DataFrame
:
df = df.resample('H').size().rename('count').to_frame()
print (df)
count
2016-08-24 00:00:00 1
2016-08-24 01:00:00 3
2016-08-24 02:00:00 1
Or you can remove from DatetimeIndex
minutes
and seconds
by converting to <M8[h]
and then aggregating size
:
import pandas as pd
df = pd.DataFrame({'time': {pd.Timestamp('2016-08-24 01:00:00'): pd.Timestamp('2016-08-13 00:00:00'), pd.Timestamp('2016-08-24 01:00:01'): pd.Timestamp('2016-08-13 00:00:01'), pd.Timestamp('2016-08-24 01:00:02'): pd.Timestamp('2016-08-13 00:00:02'), pd.Timestamp('2016-08-24 02:00:02'): pd.Timestamp('2016-08-13 00:00:02'), pd.Timestamp('2016-08-24 00:00:00'): pd.Timestamp('2016-08-13 00:00:00')}})
print (df)
time
2016-08-24 00:00:00 2016-08-13 00:00:00
2016-08-24 01:00:00 2016-08-13 00:00:00
2016-08-24 01:00:01 2016-08-13 00:00:01
2016-08-24 01:00:02 2016-08-13 00:00:02
2016-08-24 02:00:02 2016-08-13 00:00:02
df= df.groupby([df.index.values.astype('<M8[h]')]).size()
print (df)
2016-08-24 00:00:00 1
2016-08-24 01:00:00 3
2016-08-24 02:00:00 1
dtype: int64
Upvotes: 2
Reputation: 25629
Using @jezrael setup.
df.resample(rule='H', how='count').rename(columns = {'time':'count'})
count
2016-08-24 00:00:00 1
2016-08-24 01:00:00 3
2016-08-24 02:00:00 1
Upvotes: 3
Reputation: 630
You can use pandas.DatetimeIndex
as follows.
import numpy as np
import pandas as pd
# An example of time period
drange = pd.date_range('2016-08-01 00:00:00', '2016-09-01 00:00:00',
freq='10min')
N = len(drange)
# The number of columns without 'time' is three.
df = pd.DataFrame(np.random.rand(N, 3))
df['time'] = drange
time_col = pd.DatetimeIndex(df['time'])
gb = df.groupby([time_col.year,
time_col.month,
time_col.day,
time_col.hour])
for col_name, gr in gb:
print(gr) # If you want to see only the length, use print(len(gr))
[References] Python Pandas: Group datetime column into hour and minute aggregations
Upvotes: 1