Reputation: 10476
Say I have two pandas dataframes:
import pandas as pd
df1 = pd.DataFrame(
{
"Start": {
0: "2019-07-19 07:00:00",
1: "2019-07-19 08:00:00",
2: "2019-07-19 10:00:00",
},
"Finish": {
0: "2019-07-19 07:30:00",
1: "2019-07-19 08:30:00",
2: "2019-07-19 10:30:00",
},
}
)
df2 = pd.DataFrame(
{
"Start": {0: "2019-07-19 07:30:00", 1: "2019-07-19 08:15:00",},
"Finish": {0: "2019-07-19 08:00:00", 1: "2019-07-19 09:00:00",},
}
)
df1.Start = pd.to_datetime(df1.Start)
df2.Finish = pd.to_datetime(df2.Finish)
They look like this:
| | Start | Finish |
|---:|:--------------------|:--------------------|
| 0 | 2019-07-19 07:00:00 | 2019-07-19 07:30:00 |
| 1 | 2019-07-19 08:00:00 | 2019-07-19 08:30:00 |
| 2 | 2019-07-19 10:00:00 | 2019-07-19 10:30:00 |
| | Start | Finish |
|---:|:--------------------|:--------------------|
| 0 | 2019-07-19 07:30:00 | 2019-07-19 08:00:00 |
| 1 | 2019-07-19 08:15:00 | 2019-07-19 09:00:00 |
Here's what it looks if I plot them (colouring in the part between Start
and Finish
of each row):
Think of it as if df1
records times when TV1
was on, and df2
records times when TV2
was on. I would like to find the total amount of time that any TV was on. In the plot above, this is shown with the line df1 or df2
.
Here's how I made the plot:
import plotly.figure_factory as ff
df3 = pd.DataFrame(
{
"Start": {0: "2019-07-19 07:00:00", 1: "2019-07-19 10:00:00",},
"Finish": {0: "2019-07-19 09:00:00", 1: "2019-07-19 10:30:00",},
}
)
df1['Resource'] = ['df1']*3
df2['Resource'] = ['df2']*2
df3['Resource'] = ['df1 or df2']*2
df1['Task'] = ['df1']*3
df2['Task'] = ['df2']*2
df3['Task'] = ['df1 or df2']*2
fig = ff.create_gantt(
pd.concat([df1, df2, df3]).reset_index(drop=True),
group_tasks=True,
index_col="Resource",
)
fig.show()
and here's out I printed out the Dataframes:
from tabulate import tabulate
print(df1.pipe(tabulate, headers="keys", tablefmt="pipe"))
print(df2.pipe(tabulate, headers="keys", tablefmt="pipe"))
Upvotes: 4
Views: 102
Reputation: 25895
Note, the input here is based on the original question.
I'm not sure if this can be done nicely since you are always comparing rows, but one way:
df1['start_time'] = pd.to_datetime(df1['start_time'])
df2['start_time'] = pd.to_datetime(df2['start_time'])
df1['end_time'] = pd.to_datetime(df1['end_time'])
df2['end_time'] = pd.to_datetime(df2['end_time'])
all_events = pd.concat((df1, df2)).sort_values('start_time')
result = all_events.iloc[0:1].copy()
for _, row in all_events.iterrows():
if row['start_time'] <= result['end_time'].iloc[-1]:
if row['end_time'] > result['end_time'].iloc[-1]:
result['end_time'].iloc[-1] = row['end_time']
else:
result = result.append(row, ignore_index=True)
print(all_events)
print(result)
The starting part is just to let Pandas handle my time comparison by itself. The basics:
Results for your table (sorted all events and the result):
end_time start_time
0 2019-07-19 06:07:10 2019-07-19 06:04:57
1 2019-07-19 06:27:41 2019-07-19 06:26:33
2 2019-07-19 06:35:43 2019-07-19 06:33:18
0 2019-07-19 06:35:53 2019-07-19 06:34:56
1 2019-07-19 06:37:45 2019-07-19 06:36:44
2 2019-07-19 06:40:11 2019-07-19 06:38:33
3 2019-07-19 06:40:25 2019-07-19 06:38:37
4 2019-07-19 07:02:20 2019-07-19 06:59:48
3 2019-07-19 07:06:47 2019-07-19 07:01:20
4 2019-07-19 07:09:19 2019-07-19 07:07:17
end_time start_time
0 2019-07-19 06:07:10 2019-07-19 06:04:57
1 2019-07-19 06:27:41 2019-07-19 06:26:33
2 2019-07-19 06:35:53 2019-07-19 06:33:18
3 2019-07-19 06:37:45 2019-07-19 06:36:44
4 2019-07-19 06:40:25 2019-07-19 06:38:33
5 2019-07-19 07:06:47 2019-07-19 06:59:48
6 2019-07-19 07:09:19 2019-07-19 07:07:17
The actual deltas are just:
>>> print(result['end_time'] - result['start_time'])
0 00:02:13
1 00:01:08
2 00:02:35
3 00:01:01
4 00:01:52
5 00:06:59
6 00:02:02
Upvotes: 2