Fluxy
Fluxy

Reputation: 2978

Group datetime values by datetime ranges and calculate the min and max values per range

I have the following DatetimeIndex values:

DatetimeIndex(['2021-01-18 01:32:00', '2021-01-18 01:33:00',
               '2021-01-18 01:34:00', '2021-01-18 01:35:00',
               '2021-01-18 01:36:00', '2021-01-18 01:37:00',
               '2021-12-16 12:07:00', '2021-12-16 12:08:00',
               '2021-12-16 12:09:00', '2021-12-16 12:10:00'],
              dtype='datetime64[ns]', length=10, freq=None)

I need to group them by datetime ranges and calculate the min and max values per range.

This is the expected result:

range  range_min               range_max
1      2021-01-18 01:32:00     2021-01-18 01:37:00
2      2021-12-16 12:07:00     2021-12-16 12:10:00

How can I do it?

I can get min and max across the complete set of values of timestamps, but I don't know how to group timestamps into ranges.

import numpy as np
import pandas as pd

pd.DataFrame(my_timestamps,columns=["timestamp"]).agg({"timestamp" : [np.min, np.max]})

Upvotes: 0

Views: 45

Answers (2)

Wizard.Ritvik
Wizard.Ritvik

Reputation: 11611

Here's a way to do it without pandas:

from pprint import pprint

L = ['2021-01-18 01:32:00', '2021-01-18 01:33:00',
     '2021-01-18 01:34:00', '2021-01-18 01:35:00',
     '2021-01-18 01:36:00', '2021-01-18 01:37:00',
     '2021-12-16 12:07:00', '2021-12-16 12:08:00',
     '2021-12-16 12:09:00', '2021-12-16 12:10:00']

ranges = []

r_min = L[0]
d_min = r_min.split(' ', 1)[0]
r_max = None

for dt in L[1:]:
    date = dt.split(' ', 1)[0]
    if date != d_min:
        ranges.append((r_min, r_max))
        r_min = dt
        d_min = r_min.split(' ', 1)[0]
    r_max = dt
ranges.append((r_min, r_max))

pprint(ranges)

Output:

[('2021-01-18 01:32:00', '2021-01-18 01:37:00'),
 ('2021-12-16 12:07:00', '2021-12-16 12:10:00')]

Upvotes: 1

mozway
mozway

Reputation: 260790

You can use a custom groupby.agg using the date as grouper with DatetimeIndex.normalize:

idx = pd.DatetimeIndex(['2021-01-18 01:32:00', '2021-01-18 01:33:00',
                        '2021-01-18 01:34:00', '2021-01-18 01:35:00',
                        '2021-01-18 01:36:00', '2021-01-18 01:37:00',
                        '2021-12-16 12:07:00', '2021-12-16 12:08:00',
                        '2021-12-16 12:09:00', '2021-12-16 12:10:00'],)


out = (idx.to_series().groupby(pd.factorize(idx.normalize())[0]+1)
          .agg(**{'range_min': 'min', 'range_max': 'max'})
          .rename_axis('range').reset_index()
      )

print(out)

Output:

   range           range_min           range_max
0      1 2021-01-18 01:32:00 2021-01-18 01:37:00
1      2 2021-12-16 12:07:00 2021-12-16 12:10:00

Upvotes: 1

Related Questions