Reputation: 389
I am trying to do a highly customized chart using Matplotlib (open to use any other library).
The data that I have looks like this
ItemID | ItemPhase | ItemStatus | ItemOutcome | Date
1 Phase1 Complete In 01-02-2011
2 Phase2 WIP WIP 01-03-2014
3 Phase1 Complete Out 05-02-2010
4 Phase3 WIP WIP 01-04-2015
5 Phase2 Complete In 01-05-2012
6 Phase2 WIP WIP 01-02-2013
7 Phase3 Complete In 01-06-2015
8 Phase2 Complete Out 01-07-2013
The idea of the chart is to show progress against the Items that have been complete for each Phase. Every time an item is complete then an outcome is determined, if the item hasn't been completed then there is no outcome.
The date is only useful to get the ItemPhase, based on the Date, the phase is determined.
I would like the chart to look like this:
As you can see from the image, the Item Outcomes section is built out of the result of the Items Status section.
I have struggled to start or get an idea on how to build this so any help is very appreciated.
Thanks for your support!
Upvotes: 3
Views: 417
Reputation: 5294
Here's some pointer to get you started.
Let's say you have your dataframe as df
and you want to plot the ItemStatus
cells for Phase2
:
df = df[df['ItemPhase'] == 'Phase2']
total = df['ItemStatus'].count()
First thing you can plot is the dotted bar with the items count. If each status cell is 1.0
high (to be split between Complete and WIP percentages) we can allocate something like 40% (height=1.4
) more for the label.
ax.bar(0, 1.4, width=1, edgecolor='black', lw=1, ls='dotted', color="white")
Now let's plot the main part, you want a first bar that goes from 0
to the frequency of ItemStatus == WIP
and the second bar that starts from this frequency and goes up to one. You can get each status count with value_counts
and divide by total
to get percentages.
bottom = 0
for i, s in enumerate(df['ItemStatus'].value_counts().iteritems()):
label, count = s
freq = count / float(total)
r, = ax.bar(0, freq, width=1, bottom=bottom, color=bg[i], edgecolor='black', lw=3)
ax.text(r.get_x() + r.get_width()/2.,
r.get_y() + r.get_height()/2.,
'{}% ({}) Items {}'.format(int(freq * 100), count, label),
ha="center", va="center", color=fg[i])
bottom += freq
Now you just need the n Items
label. You can use the latest bar plot r
coordinates to properly center it.
ax.text(r.get_x() + r.get_width()/2., 1.2,
'{} Items'.format(total),
ha="center", va='center', color='black')
And here's what you get
Now you need to:
pandas
way to iterate through all the cellsPhase x
, Item Status
, etcUpvotes: 2