Reputation: 1607
I am running into some issues adding Matplotlib lines into Pandas plot. I am trying to plot a straight line using the slope to determine what the start and end-points are. But the resultant graph does not look like a straight line at all.
I have simplified the case to the MVCE below. The initial part is for setup to replicate the key feature of the complicated dataframe I have.
import pandas as pd
import matplotlib.pyplot as plt
LEN_SER = 23
dates = pd.date_range('2015-07-03', periods=LEN_SER, freq='B')
df = pd.DataFrame(range(1,LEN_SER+1), index=dates)
ts = df.iloc[:,0]
# The above is the setup of the MVCE to replicate the issue.
fig = plt.figure()
ax1 = plt.subplot2grid((1, 1), (0, 0))
ax1.plot([ts.index[5], ts.index[20]],
[ts[5], ts[5] + (1.0 * (20 - 5))], 'o-')
ts.plot(ax=ax1)
plt.show()
This gives a graph that has a wavy line due to the weekends. The Matplotlib is affecting how Pandas is plotting the series. If I take out the ax1.plot() line, then it becomes a straight line.
So the question is: How do I draw straight lines on my Pandas plot with Matplotlib? Put it another way, I want the plot to treat the axis labels as categories so weekends will be ignored. That way, I am hoping that Matplotlib and Pandas will both give a straight line.
Upvotes: 0
Views: 451
Reputation: 121
As you correctly observe, if you delete the line ax1.plot(), then matplotlib treats your dates as categories, and the pandas plot is a nice straight line. However, in the command
ax1.plot([ts.index[5], ts.index[20]],
[ts[5], ts[5] + (1.0 * (20 - 5))], 'o-')
you ask matplotlib to interpolate between two points, in the process of interpolating matplotlib recognize dates in the x-axis. That is why the straight line pandas plot with respect to date categories (5 a week) becomes a wavy line with respect to dates (7 a week). Which is correct as well, because with respect to dates your data simply isn't a represented by a straight line.
You can force the category interpretation replacing dates by strings through
df.index = df.reset_index().apply(lambda x: x['index'].strftime('%Y-%m-%d'), axis=1)
before defining ts. That results in the plot
Now the matplotlib plot is just two categories against two values and matplotlib does not bother to realize that the two categories are among the categories in the pandas plot. (Changing the order of the two plots saves your x-axis at least.) Modifying the matplotlib plot to
ax1.plot([5, 20], [ts[5], ts[5] + (1.0 * (20 - 5))], 'o-')
plots a line between categories 5 and 20, and finally gives you two straight lines with respect to a categories x-axis.
Full code:
import pandas as pd
import matplotlib.pyplot as plt
plt.style.use('seaborn') # (optional - style was set when I produced my graph)
LEN_SER = 23
dates = pd.date_range('2015-07-03', periods=LEN_SER, freq='B')
df = pd.DataFrame(range(1,LEN_SER+1), index=dates)
df.index = df.reset_index().apply(lambda x: \
x['index'].strftime('%Y-%m-%d'), axis=1) # dates -> categories (string)
ts = df.iloc[:,0]
# The above is the setup of the MVCE to replicate the issue.
fig = plt.figure()
ax1 = plt.subplot2grid((1, 1), (0, 0))
ax1.plot([5, 20], [ts[5], ts[5] + (1.0 * (20 - 5))], 'o-')
# x coordinates 'categories' 5 and 20
ts.plot(ax=ax1)
plt.show()
Upvotes: 1
Reputation: 651
For simplicity I started from 2015-07-04. Does it work for you?
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
LEN_SER = 21
dates = pd.date_range('2015-07-04', periods=LEN_SER, freq='B')
the_axes = []
# take the_axes like monday and friday for each week
for monday, friday in zip(dates[dates.weekday==0], dates[dates.weekday==4]):
the_axes.append([monday.date(), friday.date()])
x = dates
y = range(1,LEN_SER+1)
n_Axes = len(the_axes)
fig,(axes) = plt.subplots(1, n_Axes, sharey=True, figsize=(15,8))
for i in range(n_Axes):
ax = axes[i]
ax.plot(x, y)
ax.set_xlim(the_axes[i])
fig.autofmt_xdate()
print(dates)
plt.show()
Upvotes: 0
Reputation: 1053
You're right - it is due to weekends. You can tell by the slope - five consecutive days have a sharper incline (+1 each day), than the three consecutive days (+1 total). So, what exactly do you want to plot? If you want to literally plot the blue line, you can interpolate the points between your two points like this:
...
# ts.plot(ax=ax1)
ts.iloc[[5,20]].resample('1D').interpolate(how='mean').plot(ax=ax1)
plt.show()
Upvotes: 0
Reputation: 795
You already answered the question: " probably due to the weekends"
replace: dates = pd.date_range('2015-07-03', periods=LEN_SER, freq='B')
with
dates = pd.date_range('2015-07-03', periods=LEN_SER, freq='D')
B - business day frequency D - calendar day frequency
And your lines are straightened.
Upvotes: 0