Reputation: 539
I have a problem:
import pandas
df1=pandas.DataFrame([['2017-1-22',25,None],['2017-1-23','',''],['2017-1-24',24,15]],columns=['date','high_tem','low_tem'])
df2=pandas.DataFrame([['2017-1-22',22,18],['2017-1-23',23,''],['2017-1-24',20,10]],columns=['date','high_tem','low_tem'])
df3=pandas.DataFrame([['2017-1-22',25,16],['2017-1-23',24,18],['2017-1-24',22,11]],columns=['date','high_tem','low_tem'])
the df1,df2,df3 like this:
date high_tem low_tem
0 2017-1-22 25 None
1 2017-1-23
2 2017-1-24 24 15
date high_tem low_tem
0 2017-1-22 22 18
1 2017-1-23 23
2 2017-1-24 20 10
date high_tem low_tem
0 2017-1-22 25 16
1 2017-1-23 24 18
2 2017-1-24 22 11
I try get the result like this:
date high_tem low_tem
0 2017-1-22 22 18
1 2017-1-23 24 18
2 2017-1-24 24 15
the None is in df1 first rows(the index=0). So I use the df2 first row(which have the same date with df1 first rows) to replace it.
the Null string is in df1 second row(the index=1), the df2 second have null string.So I use the df3 second row to replce it.
the df1 thrid row doesn't contain None and Null string, I don't change this row.
Thanks
Upvotes: 1
Views: 498
Reputation: 29729
You could loop through each dataframe and replace None
and empty strings with np.NaN
. Whenever there are presence of any NaNs
, fill the complete row with it.
for df in [df1, df2, df3]:
df.replace({None: np.NaN, "": np.NaN}, inplace=True)
df.loc[df.isnull().any(axis=1), ['high_tem', 'low_tem']] = np.NaN
Following this, df1
, df2
and df3
would be modified accordingly.
Use DF.combine_first()
in df1→df2→df3
order to fill the missing values.
df1.combine_first(df2).combine_first(df3)
Upvotes: 1