Reputation: 234
There are multiple questions on plotting multiple graphs but not specifically for pandas and timelines.
I have a dataframe like the below:
Name | day_1_start | day_1_end | day_2_start | day_2_end |
---|---|---|---|---|
A | 1:00pm | 3:00pm | 3:30pm | 5:30pm |
B | 11:00am | 1:00pm | 3:45pm | 4:30pm |
C | 10:00am | 11:00am | 11:30am | 4:30pm |
I am trying to plot this into a gantt chart/timeline. Using plotly.express.timeline
, is it possible to have multiple x_start
and x_end
? Namely, can I have day_1_start
and day_2_start
both be used as x_start
and day_1_end
and day_2_end
used as x_end
?
I believe I can also solve this by creating a new table (like the below example) but wondering if it's possible without needing to do this transformation as it's expensive.
Name | start | end | day |
---|---|---|---|
A | 1:00pm | 3:00pm | 1 |
A | 3:30pm | 5:30pm | 2 |
B | 11:00am | 1:00pm | 1 |
B | 3:45pm | 4:30pm | 2 |
This is roughly what I'd like to end up with with day 1 and day 2 shown as different colors - from https://plotly.com/python/gantt/]
Upvotes: 1
Views: 1755
Reputation: 35230
Since the combination of two time series cannot be treated as a single time series, the wide format needs to be converted to long format. wide_to_long()
could not be used to keep multiple columns, so I created a data frame for each target unit, though poorly, and joined them vertically. There may be other, more elegant ways to do this. If I can do it in vertical format, I can draw a timeline with px.timeline()
or ff.create_gantt()
. ff.create_gannt()
has a column name specified, so I changed it.
import pandas as pd
import numpy as np
import io
data = '''
Name day_1_start day_1_end day_2_start day_2_end
A 1:00pm 3:00pm 3:30pm 5:30pm
B 11:00am 1:00pm 3:45pm 4:30pm
C 10:00am 11:00am 11:30am 4:30pm
'''
df = pd.read_csv(io.StringIO(data), delim_whitespace=True)
df_d1 = df[['Name','day_1_start','day_1_end']]
df_d1 = df_d1.assign(day=['day_1']*len(df_d1)).rename(columns={'day_1_start':'start','day_1_end':'end'})
df_d2 = df[['Name','day_2_start','day_2_end']]
df_d2 = df_d2.assign(day=['day_2']*len(df_d2)).rename(columns={'day_2_start':'start','day_2_end':'end'})
df_long = pd.concat([df_d1,df_d2],axis=0, ignore_index=True)
df_long['start'] = pd.to_datetime(df_long['start'])
df_long['end'] = pd.to_datetime(df_long['end'])
px.line
import plotly.express as px
import pandas as pd
fig = px.timeline(df_long, x_start="start", x_end="end", y="Name", color="day")
fig.update_yaxes(autorange="reversed")
fig.show()
ff.create_gannt
import plotly.figure_factory as ff
colors = {'day_1': 'rgb(220, 0, 0)', 'day_2': 'rgb(0, 255, 100)'}
df_long_ff = df_long.copy()
df_long_ff.columns = ['Task', 'Start', 'Finish', 'Resource']
fig = ff.create_gantt(df_long_ff, colors=colors, index_col='Resource', show_colorbar=True, group_tasks=True)
fig.show()
Upvotes: 0