hamid mohebzadeh
hamid mohebzadeh

Reputation: 131

overlap graph of missing values (NaN values) with filled values

I have the below Panda DataFrame that contains two columns. The first column is original values containing the missing values (NaN values) and the second column that is the result of missing imputation for filling the NaN values in the first column. How can I plot these two columns in the same graph that show the original values with filled values like the graph below:

Data=pd.DataFrame([[3.83092724,        np.nan],
   [       np.nan, 3.94103207],
   [       np.nan, 3.86621724],
   [3.48386179,        np.nan],
   [       np.nan, 3.7430167 ],
   [3.2382959 ,        np.nan],
   [3.9143139 ,        np.nan],
   [4.46676265,        np.nan],
   [       np.nan, 3.9340262 ],
   [3.650658  ,        np.nan],
   [       np.nan, 3.10590516],
   [4.19497691,        np.nan],
   [4.11873876,        np.nan],
   [4.15286075,        np.nan],
   [4.67441617,        np.nan],
   [4.50631534,        np.nan],
   [       np.nan, 4.01349688],
   [       np.nan, 3.48459778],
   [       np.nan, 3.83495488],
   [       np.nan, 3.10590516],
   [       np.nan, 4.09355884],
   [4.8433281 ,        np.nan],
   [       np.nan, 3.33450675],
   [4.86672126,        np.nan],
   [       np.nan, 3.2382959 ],
   [       np.nan, 3.48210011],
   [       np.nan, 3.00958811],
   [       np.nan, 3.05774663]], columns=['original', 'filled'])

enter image description here

Upvotes: 1

Views: 248

Answers (1)

Stef
Stef

Reputation: 30639

You need markers, otherwise the chart makes no sense if you have individual original values surrounded by missing values.
We first plot the original values. Then, for the filled values, we fill any missing value directly adjacent to an existing filled value, with the original value to get the dashed line from that original value to the next/preceding filled value. Finally we plot these amended filled values column as a dashed line.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

df=pd.DataFrame([[3.83092724, np.nan],
   [       np.nan, 3.94103207],
   [       np.nan, 3.86621724],
   [3.48386179,        np.nan],
   [       np.nan, 3.7430167 ],
   [3.2382959 ,        np.nan],
   [3.9143139 ,        np.nan],
   [4.46676265,        np.nan],
   [       np.nan, 3.9340262 ],
   [3.650658  ,        np.nan],
   [       np.nan, 3.10590516],
   [4.19497691,        np.nan],
   [4.11873876,        np.nan],
   [4.15286075,        np.nan],
   [4.67441617,        np.nan],
   [4.50631534,        np.nan],
   [       np.nan, 4.01349688],
   [       np.nan, 3.48459778],
   [       np.nan, 3.83495488],
   [       np.nan, 3.10590516],
   [       np.nan, 4.09355884],
   [4.8433281 ,        np.nan],
   [       np.nan, 3.33450675],
   [4.86672126,        np.nan],
   [       np.nan, 3.2382959 ],
   [       np.nan, 3.48210011],
   [       np.nan, 3.00958811],
   [       np.nan, 3.05774663]], columns=['original', 'filled'])

_,ax = plt.subplots()
df.original.plot(marker='o', ax=ax)

m = (df.filled.isna()&df.filled.shift(1).notna()) | (df.filled.isna()&df.filled.shift(-1).notna())
df.filled.fillna(df.loc[m,'original']).plot(ls='--', ax=ax, color=ax.get_lines()[0].get_color())

enter image description here


The above is a clean solution for the general case. If the original values are drawn with a solid opaque line and the filled values with a line width of not greater than that of the original values, you can simply first draw the completely filled filled values and then, on top of that line, the original values:
df.filled.fillna(df.original).plot(ax=ax, color='blue', ls='--')
df.original.plot(marker='o', ax=ax, color='blue')

Upvotes: 1

Related Questions