Reputation: 131
I have the below Panda DataFrame that contains two columns. The first column is original values containing the missing values (NaN values) and the second column that is the result of missing imputation for filling the NaN values in the first column. How can I plot these two columns in the same graph that show the original values with filled values like the graph below:
Data=pd.DataFrame([[3.83092724, np.nan],
[ np.nan, 3.94103207],
[ np.nan, 3.86621724],
[3.48386179, np.nan],
[ np.nan, 3.7430167 ],
[3.2382959 , np.nan],
[3.9143139 , np.nan],
[4.46676265, np.nan],
[ np.nan, 3.9340262 ],
[3.650658 , np.nan],
[ np.nan, 3.10590516],
[4.19497691, np.nan],
[4.11873876, np.nan],
[4.15286075, np.nan],
[4.67441617, np.nan],
[4.50631534, np.nan],
[ np.nan, 4.01349688],
[ np.nan, 3.48459778],
[ np.nan, 3.83495488],
[ np.nan, 3.10590516],
[ np.nan, 4.09355884],
[4.8433281 , np.nan],
[ np.nan, 3.33450675],
[4.86672126, np.nan],
[ np.nan, 3.2382959 ],
[ np.nan, 3.48210011],
[ np.nan, 3.00958811],
[ np.nan, 3.05774663]], columns=['original', 'filled'])
Upvotes: 1
Views: 248
Reputation: 30639
You need markers, otherwise the chart makes no sense if you have individual original values surrounded by missing values.
We first plot the original values. Then, for the filled values, we fill any missing value directly adjacent to an existing filled value, with the original value to get the dashed line from that original value to the next/preceding filled value. Finally we plot these amended filled values column as a dashed line.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df=pd.DataFrame([[3.83092724, np.nan],
[ np.nan, 3.94103207],
[ np.nan, 3.86621724],
[3.48386179, np.nan],
[ np.nan, 3.7430167 ],
[3.2382959 , np.nan],
[3.9143139 , np.nan],
[4.46676265, np.nan],
[ np.nan, 3.9340262 ],
[3.650658 , np.nan],
[ np.nan, 3.10590516],
[4.19497691, np.nan],
[4.11873876, np.nan],
[4.15286075, np.nan],
[4.67441617, np.nan],
[4.50631534, np.nan],
[ np.nan, 4.01349688],
[ np.nan, 3.48459778],
[ np.nan, 3.83495488],
[ np.nan, 3.10590516],
[ np.nan, 4.09355884],
[4.8433281 , np.nan],
[ np.nan, 3.33450675],
[4.86672126, np.nan],
[ np.nan, 3.2382959 ],
[ np.nan, 3.48210011],
[ np.nan, 3.00958811],
[ np.nan, 3.05774663]], columns=['original', 'filled'])
_,ax = plt.subplots()
df.original.plot(marker='o', ax=ax)
m = (df.filled.isna()&df.filled.shift(1).notna()) | (df.filled.isna()&df.filled.shift(-1).notna())
df.filled.fillna(df.loc[m,'original']).plot(ls='--', ax=ax, color=ax.get_lines()[0].get_color())
df.filled.fillna(df.original).plot(ax=ax, color='blue', ls='--')
df.original.plot(marker='o', ax=ax, color='blue')
Upvotes: 1