Reputation: 42329
I have a pandas
dataframe (stored in a .csv
file) with the following format.
val,date,time
0.001,01JAN90,0:00:00
0.002,01JAN90,0:01:00
0.005,01JAN90,0:02:00
0.056,01JAN90,0:03:00
...
0.067,31DEC90,23:55:00
0.007,31DEC90,23:56:00
0.006,31DEC90,23:57:00
0.004,31DEC90,23:58:00
0.003,31DEC90,23:59:00
This is: a single float (val
column) for each minute (time
column) of each day (time
column) in a year. I need to group the val
elements throughout the entire year, that belong to a given hour range. I define 15 hour ranges as:
t_range = [['5:30:00', '6:30:00'], ['6:30:00', '7:30:00'], ...,
['19:30:00', '20:30:00']]
The answer given here Pandas Groupby Range of Values deals with ranges defined as floats, but my ranges are defined as strings.
My idea is that I'd need to first convert all the HH:MM:SS values in time
to floats, and then apply the solution based on groupby and pd.cut. Is this the proper approach? How should I be using pandas
to do this if not?
Upvotes: 0
Views: 1188
Reputation: 210832
IIUC you can do it this way:
start = 5*60+30
end = 20*60+30
step = 60
df['ts'] = pd.to_datetime(df.date + ' ' + df.time, format='%d%b%y %H:%M:%S')
df['mins'] = df.ts.dt.hour*60 + df.ts.dt.minute
# filter out all "non-interesting" entries
x = df.query("@start <= mins <= @end")
bins = np.arange(start-step, end+step, step)
labels = ['({0[0]:02d}:{0[1]:02d}:00, {0[0]:02d}:{0[1]:02d}:00]'.format(divmod(x,60),
divmod(x+step,60))
for x in bins[:-1]]
x.groupby(pd.cut(x['mins'], bins=bins, labels=labels))['val'].sum().dropna()
Result:
In [164]: x.groupby(pd.cut(x['mins'], bins=bins, labels=labels))['val'].sum().dropna()
Out[164]:
mins
(05:30:00, 06:30:00] 0.006
(06:30:00, 07:30:00] 0.004
(07:30:00, 08:30:00] 0.003
(08:30:00, 09:30:00] 0.111
(09:30:00, 10:30:00] 0.001
(10:30:00, 11:30:00] 0.002
(11:30:00, 12:30:00] 0.005
(12:30:00, 13:30:00] 0.056
Name: val, dtype: float64
Source DF:
In [166]: df
Out[166]:
val date time
0 0.067 01DEC90 04:00:00
1 0.007 01DEC90 05:00:00
2 0.006 01DEC90 06:00:00
3 0.004 01DEC90 07:00:00
4 0.003 01DEC90 08:00:00
5 0.111 01DEC90 09:00:00
6 0.001 01JAN90 10:00:00
7 0.002 01JAN90 11:00:00
8 0.005 01JAN90 12:00:00
9 0.056 01JAN90 13:00:00
Explanation:
bins: # of minutes
In [181]: bins
Out[181]: array([ 270, 330, 390, 450, 510, 570, 630, 690, 750, 810, 870, 930, 990, 1050, 1110, 1170, 1230])
labels
In [182]: labels
Out[182]:
['(04:30:00, 04:30:00]',
'(05:30:00, 05:30:00]',
'(06:30:00, 06:30:00]',
'(07:30:00, 07:30:00]',
'(08:30:00, 08:30:00]',
'(09:30:00, 09:30:00]',
'(10:30:00, 10:30:00]',
'(11:30:00, 11:30:00]',
'(12:30:00, 12:30:00]',
'(13:30:00, 13:30:00]',
'(14:30:00, 14:30:00]',
'(15:30:00, 15:30:00]',
'(16:30:00, 16:30:00]',
'(17:30:00, 17:30:00]',
'(18:30:00, 18:30:00]',
'(19:30:00, 19:30:00]']
Upvotes: 3