Reputation: 191
I have four columns in my dataset where one of columns is timestamp in epoch which I converted to datetime using pd.to_datetime and I would like to get the count of each group per minute
The dataset is something like this
timestamp col2 col3 col4
1555127370.734 x a p
1555127370.980 y b q
1555127371.376 z c r
1555127371.430 x a p
1555127371.313 w g t
1555127371.748 z h u
So i need them to be grouped by per minute per col2 per col3 per col4 and the count of that group.
Is there a way to do this pandas?
EDIT The output should look like this
1555127340 x a p 2
1555127340 y b q 1
1555127340 z c r 1
1555127340 w g t 1
1555127340 z h u 1
Thank you in advance
Upvotes: 1
Views: 377
Reputation: 862511
Use DataFrame.resample
with Resampler.size
:
df['timestamp'] = pd.to_datetime(df['timestamp'], unit='s')
df = df.resample('min', on='timestamp').size().reset_index(name='count')
print (df)
timestamp count
0 2019-04-13 03:49:00 6
Alternative is DataFrameGroupBy.size
:
df = df.groupby(pd.Grouper(freq='min', key='timestamp')).size().reset_index(name='count')
Another approach id remove seconds by Series.dt.floor
and pass to groupby
:
df = df.groupby(df['timestamp'].dt.floor('min')).size().reset_index(name='count')
print (df)
timestamp count
0 2019-04-13 03:49:00 6
If need grouping with multiple columns use list
:
df['timestamp'] = pd.to_datetime(df['timestamp'], unit='s')
df1 = (df.groupby([pd.Grouper(freq='min', key='timestamp'), 'col2', 'col3', 'col4'],sort=False)
.size()
.reset_index(name='count'))
print (df1)
timestamp col2 col3 col4 count
0 2019-04-13 03:49:00 x a p 2
1 2019-04-13 03:49:00 y b q 1
2 2019-04-13 03:49:00 z c r 1
3 2019-04-13 03:49:00 w g t 1
4 2019-04-13 03:49:00 z h u 1
Upvotes: 1
Reputation: 71560
Try the below code:
print(df.groupby(df['timestamp'].map(lambda x: x.minute)).count())
Upvotes: 0