Reputation: 671
I have an excel worksheet, let us say its name is 'ws_actual'. The data looks as below.
Project Name Date Paid Actuals Item Amount Cumulative Sum
A 2016-04-10 00:00:00 124.2 124.2
A 2016-04-27 00:00:00 2727.5 2851.7
A 2016-05-11 00:00:00 2123.58 4975.28
A 2016-05-24 00:00:00 2500 7475.28
A 2016-07-07 00:00:00 38374.6 45849.88
A 2016-08-12 00:00:00 2988.14 48838.02
A 2016-09-02 00:00:00 23068 71906.02
A 2016-10-31 00:00:00 570.78 72476.8
A 2016-11-09 00:00:00 10885.75 83362.55
A 2016-12-08 00:00:00 28302.95 111665.5
A 2017-01-19 00:00:00 4354.3 116019.8
A 2017-02-28 00:00:00 3469.77 119489.57
A 2017-03-29 00:00:00 267.75 119757.32
B 2015-04-27 00:00:00 2969.93 2969.93
B 2015-06-02 00:00:00 118.8 3088.73
B 2015-06-18 00:00:00 2640 5728.73
B 2015-06-26 00:00:00 105.6 5834.33
B 2015-09-03 00:00:00 11879.7 17714.03
B 2015-10-22 00:00:00 5303.44 23017.47
B 2015-11-08 00:00:00 52000 75017.47
B 2015-11-25 00:00:00 2704.13 77721.6
B 2016-03-09 00:00:00 59752.85 137474.45
B 2016-03-13 00:00:00 512.73 137987.18
.
.
.
Let us say there are many many more projects including A and B with Date Paid and Amount information. I would like to create a plot by project where x axis is 'Date Paid' and y axis is 'Cumulative Sum', but when I just implement the following code, it just combines every project and plot every 'Cumulative Sum' at one graph. I wonder if I need to divide the table by project, save each, and then bring one by one to plot the graph. It is a lot of work, so I am wondering if there is a smarter way to do so. Please help me, genius.
import pandas as pd
import matplotlib.pyplot as plt
ws_actual = pd.read_excel(actual_file[0], sheet_name=0)
ax = ws_actual.plot(x='Date Paid', y='Cumulative Sum', color='g')
Upvotes: 0
Views: 61
Reputation: 59579
Right now you are connecting all of the points, regardless of group. A simple loop will work here allowing you to group the DataFrame
and then plot each group as a separate curve. If you want you can define your own colorcycle if you have a lot of groups, so that colors do not repeat.
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(8,8))
for id, gp in ws_actual.groupby('Project Name'):
gp.plot(x='Date Paid', y='Cumulative Sum', ax=ax, label=id)
plt.show()
Upvotes: 2
Reputation: 2017
You could just iterate the projects:
for proj in ws_actual['Project'].unique():
ws_actual[ws_actual['Project'] == proj].plot(x='Date Paid', y='Cumulative Sum', color='g')
plt.show()
Or check out seaborn for an easy way to make a facet grid for which you can set a rows variable. Something along the lines of:
import seaborn as sns
g = sns.FacetGrid(ws_actual, row="Project")
g = g.map(plt.scatter, "Date Paid", "Cumulative Sum", edgecolor="w")
Upvotes: 2