Gabriel
Gabriel

Reputation: 42329

Pandas groupby for range of time strings

I have a pandas dataframe (stored in a .csv file) with the following format.

val,date,time
0.001,01JAN90,0:00:00
0.002,01JAN90,0:01:00
0.005,01JAN90,0:02:00
0.056,01JAN90,0:03:00
...
0.067,31DEC90,23:55:00
0.007,31DEC90,23:56:00
0.006,31DEC90,23:57:00
0.004,31DEC90,23:58:00
0.003,31DEC90,23:59:00

This is: a single float (val column) for each minute (time column) of each day (time column) in a year. I need to group the val elements throughout the entire year, that belong to a given hour range. I define 15 hour ranges as:

t_range = [['5:30:00', '6:30:00'], ['6:30:00', '7:30:00'], ...,
['19:30:00', '20:30:00']]

The answer given here Pandas Groupby Range of Values deals with ranges defined as floats, but my ranges are defined as strings.

My idea is that I'd need to first convert all the HH:MM:SS values in time to floats, and then apply the solution based on groupby and pd.cut. Is this the proper approach? How should I be using pandas to do this if not?

Upvotes: 0

Views: 1188

Answers (1)

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210832

IIUC you can do it this way:

start = 5*60+30
end = 20*60+30
step = 60

df['ts'] = pd.to_datetime(df.date + ' ' + df.time, format='%d%b%y %H:%M:%S')
df['mins'] = df.ts.dt.hour*60 + df.ts.dt.minute

# filter out all "non-interesting" entries                           
x = df.query("@start <= mins <= @end")

bins = np.arange(start-step, end+step, step)
labels = ['({0[0]:02d}:{0[1]:02d}:00, {0[0]:02d}:{0[1]:02d}:00]'.format(divmod(x,60),
                                                                        divmod(x+step,60))
          for x in bins[:-1]]


x.groupby(pd.cut(x['mins'], bins=bins, labels=labels))['val'].sum().dropna()

Result:

In [164]: x.groupby(pd.cut(x['mins'], bins=bins, labels=labels))['val'].sum().dropna()
Out[164]:
mins
(05:30:00, 06:30:00]    0.006
(06:30:00, 07:30:00]    0.004
(07:30:00, 08:30:00]    0.003
(08:30:00, 09:30:00]    0.111
(09:30:00, 10:30:00]    0.001
(10:30:00, 11:30:00]    0.002
(11:30:00, 12:30:00]    0.005
(12:30:00, 13:30:00]    0.056
Name: val, dtype: float64

Source DF:

In [166]: df
Out[166]:
     val     date      time
0  0.067  01DEC90  04:00:00
1  0.007  01DEC90  05:00:00
2  0.006  01DEC90  06:00:00
3  0.004  01DEC90  07:00:00
4  0.003  01DEC90  08:00:00
5  0.111  01DEC90  09:00:00
6  0.001  01JAN90  10:00:00
7  0.002  01JAN90  11:00:00
8  0.005  01JAN90  12:00:00
9  0.056  01JAN90  13:00:00

Explanation:

bins: # of minutes

In [181]: bins
Out[181]: array([ 270,  330,  390,  450,  510,  570,  630,  690,  750,  810,  870,  930,  990, 1050, 1110, 1170, 1230])

labels

In [182]: labels
Out[182]:
['(04:30:00, 04:30:00]',
 '(05:30:00, 05:30:00]',
 '(06:30:00, 06:30:00]',
 '(07:30:00, 07:30:00]',
 '(08:30:00, 08:30:00]',
 '(09:30:00, 09:30:00]',
 '(10:30:00, 10:30:00]',
 '(11:30:00, 11:30:00]',
 '(12:30:00, 12:30:00]',
 '(13:30:00, 13:30:00]',
 '(14:30:00, 14:30:00]',
 '(15:30:00, 15:30:00]',
 '(16:30:00, 16:30:00]',
 '(17:30:00, 17:30:00]',
 '(18:30:00, 18:30:00]',
 '(19:30:00, 19:30:00]']

Upvotes: 3

Related Questions