David Yang
David Yang

Reputation: 2141

Matplotlib: How to plot Time Series on top of Scatter Plot

I have found solutions to similar questions, but they all produce odd results.

I have a plot that looks like this:

enter image description here

generated using this code:

ax1 = dft.plot(kind='scatter',x='end_date',y='pct',c='fte_grade',colormap='Reds',colorbar=False,edgecolors='red',vmin=4,vmax=10)
ax1.set_xticklabels([datetime.datetime.fromtimestamp(ts / 1e9).strftime('%Y-%m-%d') for ts in ax1.get_xticks()])

dfb.plot(kind='scatter',x='end_date',y='pct',c='fte_grade',colormap='Blues',title='%s Polls'%state,ax=ax1,colorbar=False,edgecolors='blue',vmin=4,vmax=10)
plt.ylim(30,70)
plt.axhline(50,ls='--',alpha=0.5,color='grey')
plt.xticks(rotation=20)

Now, whenever I try to plot a line ontop of this, I get something like the following:

import matplotlib.pyplot as plt
import numpy as np

x = dft['pct']
u = dft['Trump Odds']
t = list(pd.to_datetime(dft['end_date']))

plt.hold(True)
plt.subplot2grid((1, 1), (0, 0))
plt.plot(t,x)
plt.scatter(t, u)
plt.show()

enter image description here

If it's not clear, this is not what I want. These dots represent individual polls, and I have data representing a line that aggregates the individual polls. I think this has something to do with datetimes and the possibility of multiple polls for a particular date in the polling. I think that the plotter is getting confused because I have double values for the same date, so it assumes this is not a time series, and when i plot a line, it maintains the assumption that we don't need any continuity.

There must be something within python that can handle drawing a time series on top of a time xaxis scatter plot right?

dft data:

              end_date   pct  fte_grade  Trump Odds
0  1598054400000000000  32.0          6   32.000000
1  1588550400000000000  32.0          7   32.000000
2  1582156800000000000  39.0          8   34.666667
3  1585180800000000000  33.0          8   34.206897
4  1587600000000000000  29.0          8   33.081081
5  1590019200000000000  32.0          8   33.025641
6  1559779200000000000  36.0          8   33.800000
7  1593043200000000000  32.0          8   32.400000

Upvotes: 0

Views: 841

Answers (1)

Renaud
Renaud

Reputation: 2819

Is your str ange line is not due to the fact you didn't sort the df before to plot it:

import matplotlib.pyplot as plt
import numpy as np

dft=dft.sort_values(by=['end_date'])
x = dft['pct']
u = dft['Trump Odds']
t = list(pd.to_datetime(dft['end_date']))

plt.hold(True)
plt.subplot2grid((1, 1), (0, 0))
plt.plot(t,x)
plt.scatter(t, u)
plt.show()

enter image description here

Upvotes: 1

Related Questions