Reputation: 83
I am trying to plot three timeseries datasets with different start date on the same x-axis, similar to this question How to plot timeseries with different start date on the same x axis. Except that my x-axis has dates instead of days.
My data frame is structured as:
Date ColA Label
01/01/2019 1.0 Training
02/01/2019 1.0 Training
...
14/09/2020 2.0 Test1
..
06/01/2021 4.0 Test2
...
I have defined each time series as:
train = df.loc['01/01/2019':'05/08/2020', 'ColA']
test1 = df.loc['14/09/2020':'20/12/2020', 'ColA']
test2 = df.loc['06/01/2021':'18/03/2021', 'ColA']
This is how individual time series plot:
But when I try to plot them on the same x-axis, it doesn't plot in sequence of dates
I am hoping to produce something like this (from MS Excel):
Any help would be great!
Thank you
Upvotes: 1
Views: 4648
Reputation: 417
Make sure that 'Date' column in your dataframe is imported as datetime variable and not as string.
If you find dtype as "object":
df = pd.read_csv('data.csv')
data['Date']
0 2019-01-01
1 2019-01-02
2 2019-01-03
Name: Date, Length: 830, dtype: object
You need to convert to datetime variable. You can convert in two ways:
df = pd.read_csv('data.csv', parse_dates=['Date'])
OR
df = pd.read_csv('data.csv')
df['Date'] = pd.to_datetime(data['Date'])
Both options will give you the same result.
df = pd.read_csv('data.csv', parse_dates=['Date'])
data['Date']
0 2019-01-01
1 2019-01-02
2 2019-01-03
...
Name: Date, Length: 830, dtype: datetime64[ns]
Then, you can just plot:
plt.plot(data['Date'],ColA)
When you define individual time series, make sure to check the formatting of dates. Datetime format in pandas is YYYY-MM-DD. So, use this instead:
train = df.loc['2019-01-01':'2020-08-05', 'ColA'] and so on...
I am assuming that your data is stored as csv (or excel). If so, be careful of how MS Excel may change the formatting of the Date column anytime you open the data file in Excel. Best practice would be to always check the formatting of 'Date' column using
type(data['Date']) after importing dataframe.
Upvotes: 1
Reputation: 191
I assume you have a dataframe consists at least of date
, record
, and label
of training, test #1 and test#2
would sharex = True
do the trick?
fig, ax = plt.subplots(3,1, sharex = True)
for i,j in zip(data['label'].unique(), range(3)):
ax[j].plot(x = df[df['label'] == i]['date'],
y = df[df['label'] == i]['record'])
This should do it
fig, ax = plt.subplots(figsize = (14,6))
color = ['blue','red','orange']
for i,j in zip(df.Label.unique().tolist(), color):
ax.plot(x = df['Date'][df.Label == i], y = df['ColA'][df.Label == i],
color = j, label = j)
plt.legend(loc = 'best')
plt.show()
You basically want to plot multiple times in the same figure of matplotlib. Just use the initial dataset (which includes all the labels), no need to use the separated one.
Upvotes: 0