Abhishek
Abhishek

Reputation: 191

Aggregating data per minute on multiple fields

I have four columns in my dataset where one of columns is timestamp in epoch which I converted to datetime using pd.to_datetime and I would like to get the count of each group per minute

The dataset is something like this

timestamp col2 col3 col4
1555127370.734 x a  p
1555127370.980 y b  q
1555127371.376 z c  r
1555127371.430 x a  p
1555127371.313 w g  t
1555127371.748 z h  u

So i need them to be grouped by per minute per col2 per col3 per col4 and the count of that group.

Is there a way to do this pandas?

EDIT The output should look like this

1555127340 x a p 2
1555127340 y b q 1
1555127340 z c r 1
1555127340 w g t 1
1555127340 z h u 1

Thank you in advance

Upvotes: 1

Views: 377

Answers (2)

jezrael
jezrael

Reputation: 862511

Use DataFrame.resample with Resampler.size:

df['timestamp'] = pd.to_datetime(df['timestamp'], unit='s')

df = df.resample('min', on='timestamp').size().reset_index(name='count')
print (df)
            timestamp  count
0 2019-04-13 03:49:00      6

Alternative is DataFrameGroupBy.size:

df = df.groupby(pd.Grouper(freq='min', key='timestamp')).size().reset_index(name='count')

Another approach id remove seconds by Series.dt.floor and pass to groupby:

df = df.groupby(df['timestamp'].dt.floor('min')).size().reset_index(name='count')
print (df)
            timestamp  count
0 2019-04-13 03:49:00      6

If need grouping with multiple columns use list:

df['timestamp'] = pd.to_datetime(df['timestamp'], unit='s')
df1 = (df.groupby([pd.Grouper(freq='min', key='timestamp'), 'col2', 'col3', 'col4'],sort=False)
        .size()
        .reset_index(name='count'))
print (df1)
            timestamp col2 col3 col4  count
0 2019-04-13 03:49:00    x    a    p      2
1 2019-04-13 03:49:00    y    b    q      1
2 2019-04-13 03:49:00    z    c    r      1
3 2019-04-13 03:49:00    w    g    t      1
4 2019-04-13 03:49:00    z    h    u      1

Upvotes: 1

U13-Forward
U13-Forward

Reputation: 71560

Try the below code:

print(df.groupby(df['timestamp'].map(lambda x: x.minute)).count())

Upvotes: 0

Related Questions