Reputation: 5822
I have 3 dataframes, training_data, validation_data, test_data
, and I need to plot them after each other with different colors so that it looks like one line but divided in 3 color. I tried to do that by moving the x-axis start, using xlim
, for the second and third time series as following code shows, but it plots all of them startong from x=0. How can I fix it?
train_data.loc[idx].plot(kind='line'
, use_index=False
, color='blue'
, label='Training Data'
, legend=False)
validation_data.loc[idx].plot(kind='line'
, use_index=False
, figsize=(20, 5)
, xlim=362
, color='red'
, label='Validation Data'
, legend=False)
test_data.loc[idx].plot(kind='line'
, use_index=False
, figsize=(20, 5)
, xlim=481
, color='green'
, label='Test Data'
, legend=False)
plt.xlim(xmin=0)
plt.legend(loc=1, prop={'size': 'xx-small'})
plt.savefig("data.pdf")
plt.clf()
plt.close()
UPDATE:
All 3 dataframes has the following shape (N, 28), there are 138 different indexes (idx
) and all dataframes have part of each index. Actually, each index is a time series that was splitted to three parts as training, validation and test datasets. I need to plot only the first column, var0
, of each index. That's why I'm using <df>.loc[idx].iloc[:, 0]
df=
idx var0 var1 var2 var3 var4 ... var28
5171 10.0 2.8 0.0 5.0 1.0 ... 9.4
5171 40.9 2.5 3.4 4.5 1.3 ... 7.7
5171 60.7 3.1 5.2 6.6 3.4 ... 1.0
...
5171 0.5 1.3 5.1 0.5 0.2 ... 0.4
4567 1.5 2.0 1.0 4.5 0.1 ... 0.4
4567 4.4 2.0 1.3 6.4 0.1 ... 3.3
4567 6.3 3.0 1.5 7.6 1.6 ... 1.6
...
4567 0.7 1.4 1.4 0.3 4.2 ... 1.7
...
9584 0.3 2.6 0.0 5.2 1.6 ... 9.7
9584 0.5 1.2 8.3 3.4 1.3 ... 1.7
9584 0.7 3.0 5.6 6.6 3.0 ... 1.0
...
9584 0.7 1.3 0.1 0.0 2.0 ... 1.7
I tried to combine all three dataframes in one and then plot it using slicing as @Brendan Cox suggested. But I'm not getting the results I need, it still starts the plots from x=0. Here is the code:
data = pd.concat([train_data.loc[idx].iloc[:, 0], validation_data.loc[idx].iloc[:, 0], test_data.loc[idx].iloc[:, 0]])
data.iloc[0:362].plot(kind='line'
, use_index=False
, figsize=(20,5)
, color='blue'
, label='Training Data'
, legend=False)
data.iloc[362:481].plot(kind='line'
, use_index=False
, figsize=(20, 5)
, color='red'
, label='Validation Data'
, legend=False)
data.iloc[481:].plot(kind='line'
, use_index=False
, figsize=(20, 5)
, color='green'
, label='Test Data'
, legend=False)
I attached the resulted plot (which is wrong!). I need to have the red and green lines to continue after the blue line
Upvotes: 1
Views: 1424
Reputation: 5822
Getting help from this answer, I could fix the issue as follow:
limit_1 = train_data.loc[idxs[0]].iloc[:, 0].shape[0] # 362
limit_2 = train_data.loc[idxs[0]].iloc[:, 0].shape[0] + validation_data.loc[idxs[0]].iloc[:, 0].shape[0] # 481
for idx in idxs:
train_data.loc[idx].iloc[:, 0].reset_index(drop=True).plot(kind='line'
, use_index=False
, figsize=(20, 5)
, color='blue'
, label='Training Data'
, legend=False)
validation = validation_data.loc[idx].iloc[:, 0].reset_index(drop=True)
validation.index = pd.RangeIndex(len(validation.index))
validation.index = range(limit_1, limit_1+len(validation.index))
validation.plot(kind='line'
, figsize=(20, 5)
, color='red'
, label='Validation Data'
, legend=False)
test = test_data.loc[idx].iloc[:, 0].reset_index(drop=True)
test.index = pd.RangeIndex(len(test.index))
test.index = range(limit_2, limit_2+len(test.index))
test.plot(kind='line'
, figsize=(20, 5)
, color='green'
, label='Test Data'
, legend=False)
plt.legend(loc=1, prop={'size': 'xx-small'})
plt.title(str(idx))
plt.savefig(str(idx) + ".pdf")
plt.clf()
plt.close()
Upvotes: 0
Reputation: 4011
If I'm understanding correctly, you should be able to simply subset (i.e., slice) your input data along the x-axis and plot each portion of the line -- e.g.:
df = pd.read_csv("https://vincentarelbundock.github.io/Rdatasets/csv/fpp2/goog200.csv", index_col=0)
df['value'].plot()
df.loc[0:25,'value'].plot()
df.loc[25:150, 'value'].plot()
df.loc[150:, 'value'].plot()
plt.show()
Edit per comments below: use of iloc[]
and use_index=False
seems to replicate the 'starting each plot at 0' behavior. Note that your iloc
s do not select a column. Thus, you may need to revise both your iloc
and as_index=False
.
df = pd.read_csv("https://vincentarelbundock.github.io/Rdatasets/csv/fpp2/goog200.csv", index_col=0)
df.iloc[0:25,1].plot(use_index=False)
df.iloc[25:150, 1].plot(use_index=False)
df.iloc[150:, 1].plot(use_index=False)
plt.show()
Upvotes: 2