Reputation: 3376
I have a Dataframe representing the start and end time of some kind of event, e.g., when a machine in a factory has been stopped:
machine_id start_date end_date 1 1 2016-10-01 00:00:00 2016-10-01 03:00:00 2 1 2016-10-03 05:30:00 2016-10-03 06:30:00 3 2 2016-10-03 23:30:00 2016-10-04 01:00:00 4 1 2016-10-04 05:00:00 2016-10-04 06:00:00 5 2 2016-10-04 05:50:00 2016-10-04 06:00:00 6 1 2016-10-05 18:50:00 2016-10-06 02:00:00 ....
I need to get the total minutes in which any machine has been stopped, grouped by a given granularity: monthly, weekly, daily... For example, if the granularity is daily, the desired output would be:
date total_time 1 2016-10-01 180 2 2016-10-02 0 3 2016-10-03 90 4 2016-10-04 130 5 2016-10-05 310 6 2016-10-06 120 ...
If the granularity were monthly, the output would just be:
date total_time 1 2016-10-01 830
My intuition says that it probably has something to do with resample()
, but I don't know how to work with resample
and two different columns.
Upvotes: 1
Views: 280
Reputation: 5411
How about using groupby
?
In [67]: df = pandas.read_csv('data', parse_dates=[1, 2])
In [68]: for d, dfg in df.groupby(lambda i: df.end_date[i].date()):
...: print d, (dfg.end_date - dfg.start_date).sum()
...:
2016-10-03 0 days 08:27:23
2016-10-04 1 days 14:00:04
2016-10-31 56 days 19:26:40.344080
I am not sure what you mean "when the machine has been stopped": presumably, you mean when a certain even has been triggered. If so, just filter df
before the above grouping.
Upvotes: 1