Max
Max

Reputation: 123

How to get a gannt chart using matplotlib?

I have the following data:

data sample

I want to create a gannt chart that would represent a timeline in python. I looked up another post that had a similar problem but the code didn't work out for me (How to get gantt plot using matplotlib) and I can't solve the issue on my own. It seems like it has something to do with the data type of my "time" values. Here is the code itself:

import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('zpp00141_new.csv')
df.dropna(subset=['Latest finish / time', 'Earl. start / time'])
#error when I try to change data type of the columns to int
df["Latest finish / time"]= df["Latest finish / time"].astype(int) 
df["Earl. start / time"]= df["Earl. start / time"].astype(int)
#error below with data types
df["Diff"] = df['Latest finish / time'] - df['Earl. start / time']
color = {"In":"turquoise", "Out":"crimson"}
fig,ax=plt.subplots(figsize=(6,3))

labels=[]
for i, task in enumerate(df.groupby("Operation/Activity")):
    labels.append(task[0])
    for r in task[1].groupby("Operation short text"):
        data = r[1][["Earl. start / time", "Diff"]]
        ax.broken_barh(data.values, (i-0.4,0.8), color=color[r[0]] )

ax.set_yticks(range(len(labels)))
ax.set_yticklabels(labels) 
ax.set_xlabel("time [ms]")
plt.tight_layout()       
plt.show()

I tried to convert data type from object to "int" for the columns but it prompted another error: "invalid literal for int() with base 10: '9:22:00 AM'". I would really appreciate any assistance in this matter as I am quite new to programming in python. If there is a simpler and better way to represent what I need, it would be helpful if you could provide any tips. Basically, I need a gannt chart to represent each activity on the "timeline" from 7 am to 4:30 pm and reflect "now" time as a vertical line over the chart to indicate where we are now.

Upvotes: 0

Views: 936

Answers (1)

JohanC
JohanC

Reputation: 80574

When the time strings are not in a standard format, datetime.strptime can be used to convert them. strptime needs everything to be zero padded, so the code below checks whether the string starts with 1 or 2 digits and prepends a zero if needed.

Here is an example to get you started. I didn't grasp the code in the question, as some columns seem to be missing. Also, I changed the names of the columns to be compatible with variable names to be able to use row.start instead of row[1].

Colors can be assigned to each operation, just be creating a list of them. Matoplotlib has some built-in colormaps that can be used. For example, 'tab10' has 10 different colors. The list can be repeated if there aren't enough colors for each individual opereration.

import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from datetime import datetime
import math

def timestr_to_num(timestr):
    return mdates.date2num(datetime.strptime('0' + timestr if timestr[1] == ':' else timestr, '%I:%M:%S %p'))

df = pd.DataFrame({'start': ['7:00:00 AM', '1:00:00 PM', '7:20:00 AM', '2:00:00 PM'],
                   'finish': ['12:15:00 PM', '4:20:00 PM', '1:10:00 PM', '3:30:00 PM'],
                   'operation': ['operation 1', 'operation 1', 'operation 2', 'operation 3'],
                   'short_text': ['short text 1', 'short text 2', 'short text 1', 'short text 2']})
fig, ax = plt.subplots(figsize=(10, 3))
operations = pd.unique(df['operation'])
colors = plt.cm.tab10.colors  # get a list of 10 colors
colors *= math.ceil(len(operations) / (len(colors)))  # repeat the list as many times as needed
for operation, color in zip(operations, colors):
    for row in df[df['operation'] == operation].itertuples():
        left = timestr_to_num(row.start)
        right = timestr_to_num(row.finish)
        ax.barh(operation, left=left, width=right - left, height=0.8, color=color)
ax.set_xlim(timestr_to_num('07:00:00 AM'), timestr_to_num('4:30:00 PM'))
ax.xaxis.set_major_formatter(mdates.DateFormatter('%H:%M'))  # display ticks as hours and minutes
ax.xaxis.set_major_locator(mdates.HourLocator(interval=1))  # set a tick every hour
plt.tight_layout()
plt.show()

example plot

Upvotes: 2

Related Questions