Alfonso_MA
Alfonso_MA

Reputation: 555

Grouping panda by time interval + aggregate function

Let's say I have a panda like that:

2010-01-01 04:10:00:025     69
2010-01-01 04:10:01:669     1
2010-01-01 04:10:03:027     3
2010-01-01 04:10:04:003     8
2010-01-01 04:10:05:987     10
2010-01-01 04:10:06:330     99
2010-01-01 04:10:08:369     55
2010-01-01 04:10:09:987     5000
2010-01-01 04:10:11:148     13

And I need convert it in a format as following:

2010-01-01 04:10:00:000     69      69
2010-01-01 04:10:05:000     5000    10
2010-01-01 04:10:10:000     13      13

The first column corresponds to each 5 seconds interval starting at 2010-01-01 04:10:00:000.

The second column is the max of all the grouped rows.

The third column is the first of all the grouped rows.

How can I get that?

Upvotes: 1

Views: 60

Answers (2)

Erfan
Erfan

Reputation: 42946

Assuming you mean 5 seconds, we can use pd.Grouper with agg and min, first:

# use this line if your first column is not datetime type yet.
# df['col1'] = pd.to_datetime(df['col1'], format='%Y-%m-%d %H:%M:%S:%f')

df.groupby(pd.Grouper(key='col1', freq='5s'))['col2'].agg(['max', 'first']).reset_index()

Output

                 col1   max  first
0 2010-01-01 04:10:00    69     69
1 2010-01-01 04:10:05  5000     10
2 2010-01-01 04:10:10    13     13

Note: since you didn't provide column names, I called them col1, col2

Upvotes: 3

adhg
adhg

Reputation: 10893

I'll assume your first column is datetime named: date_time and your 2nd column in 'value'. In order to reach your objective you can use the "resample" such that

   df.set_index('date_time').resample("5s").agg(['max','min'])

*note I used 5 seconds because your data didn't show 5 minute

result

                             value
                             max     min
      date_time     
      2010-01-01 04:10:00    69      1
      2010-01-01 04:10:05    5000    10
      2010-01-01 04:10:10    13      13

Upvotes: 1

Related Questions