Pandas: consolidate datetime intervals

I have a Dataframe representing the start and end time of some kind of event, e.g., when a machine in a factory has been stopped:

     machine_id            start_date              end_date
1             1   2016-10-01 00:00:00   2016-10-01 03:00:00
2             1   2016-10-03 05:30:00   2016-10-03 06:30:00
3             2   2016-10-03 23:30:00   2016-10-04 01:00:00
4             1   2016-10-04 05:00:00   2016-10-04 06:00:00
5             2   2016-10-04 05:50:00   2016-10-04 06:00:00
6             1   2016-10-05 18:50:00   2016-10-06 02:00:00
....

I need to get the total minutes in which any machine has been stopped, grouped by a given granularity: monthly, weekly, daily... For example, if the granularity is daily, the desired output would be:

        date       total_time
1 2016-10-01              180
2 2016-10-02                0
3 2016-10-03               90
4 2016-10-04              130
5 2016-10-05              310
6 2016-10-06              120
...

If the granularity were monthly, the output would just be:

        date       total_time
1 2016-10-01              830

My intuition says that it probably has something to do with resample(), but I don't know how to work with resample and two different columns.

Upvotes: 1

Views: 280

Answers (1)

lbolla
lbolla

Reputation: 5411

How about using groupby?

In [67]: df = pandas.read_csv('data', parse_dates=[1, 2])

In [68]: for d, dfg in df.groupby(lambda i: df.end_date[i].date()):
    ...:     print d, (dfg.end_date - dfg.start_date).sum()
    ...: 
2016-10-03 0 days 08:27:23
2016-10-04 1 days 14:00:04
2016-10-31 56 days 19:26:40.344080

I am not sure what you mean "when the machine has been stopped": presumably, you mean when a certain even has been triggered. If so, just filter df before the above grouping.

Upvotes: 1

Related Questions