overlap graph of missing values (NaN values) with filled values

Question

I have the below Panda DataFrame that contains two columns. The first column is original values containing the missing values (NaN values) and the second column that is the result of missing imputation for filling the NaN values in the first column. How can I plot these two columns in the same graph that show the original values with filled values like the graph below:

Data=pd.DataFrame([[3.83092724,        np.nan],
   [       np.nan, 3.94103207],
   [       np.nan, 3.86621724],
   [3.48386179,        np.nan],
   [       np.nan, 3.7430167 ],
   [3.2382959 ,        np.nan],
   [3.9143139 ,        np.nan],
   [4.46676265,        np.nan],
   [       np.nan, 3.9340262 ],
   [3.650658  ,        np.nan],
   [       np.nan, 3.10590516],
   [4.19497691,        np.nan],
   [4.11873876,        np.nan],
   [4.15286075,        np.nan],
   [4.67441617,        np.nan],
   [4.50631534,        np.nan],
   [       np.nan, 4.01349688],
   [       np.nan, 3.48459778],
   [       np.nan, 3.83495488],
   [       np.nan, 3.10590516],
   [       np.nan, 4.09355884],
   [4.8433281 ,        np.nan],
   [       np.nan, 3.33450675],
   [4.86672126,        np.nan],
   [       np.nan, 3.2382959 ],
   [       np.nan, 3.48210011],
   [       np.nan, 3.00958811],
   [       np.nan, 3.05774663]], columns=['original', 'filled'])

Stef · Accepted Answer

You need markers, otherwise the chart makes no sense if you have individual original values surrounded by missing values.
We first plot the original values. Then, for the filled values, we fill any missing value directly adjacent to an existing filled value, with the original value to get the dashed line from that original value to the next/preceding filled value. Finally we plot these amended filled values column as a dashed line.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

df=pd.DataFrame([[3.83092724, np.nan],
   [       np.nan, 3.94103207],
   [       np.nan, 3.86621724],
   [3.48386179,        np.nan],
   [       np.nan, 3.7430167 ],
   [3.2382959 ,        np.nan],
   [3.9143139 ,        np.nan],
   [4.46676265,        np.nan],
   [       np.nan, 3.9340262 ],
   [3.650658  ,        np.nan],
   [       np.nan, 3.10590516],
   [4.19497691,        np.nan],
   [4.11873876,        np.nan],
   [4.15286075,        np.nan],
   [4.67441617,        np.nan],
   [4.50631534,        np.nan],
   [       np.nan, 4.01349688],
   [       np.nan, 3.48459778],
   [       np.nan, 3.83495488],
   [       np.nan, 3.10590516],
   [       np.nan, 4.09355884],
   [4.8433281 ,        np.nan],
   [       np.nan, 3.33450675],
   [4.86672126,        np.nan],
   [       np.nan, 3.2382959 ],
   [       np.nan, 3.48210011],
   [       np.nan, 3.00958811],
   [       np.nan, 3.05774663]], columns=['original', 'filled'])

_,ax = plt.subplots()
df.original.plot(marker='o', ax=ax)

m = (df.filled.isna()&df.filled.shift(1).notna()) | (df.filled.isna()&df.filled.shift(-1).notna())
df.filled.fillna(df.loc[m,'original']).plot(ls='--', ax=ax, color=ax.get_lines()[0].get_color())

The above is a clean solution for the general case. If the original values are drawn with a solid opaque line and the filled values with a line width of not greater than that of the original values, you can simply first draw the completely filled filled values and then, on top of that line, the original values:

df.filled.fillna(df.original).plot(ax=ax, color='blue', ls='--')
df.original.plot(marker='o', ax=ax, color='blue')

overlap graph of missing values (NaN values) with filled values

Answers (1)

Related Questions