SamCie
SamCie

Reputation: 131

How to replace missing data in DataFrame

Lets say I have the following DataFrame:

df = pd.DataFrame({'col1': [241, 123, 423], 'col2':[977, 78, np.NaN], 'col3':[76, 432, np.NaN], 'col4':[234, 321, 987]}, index=pd.date_range('2019-1-1', periods=3, freq="D")).rename_axis('Date')

which outputs:

            col1   col2   col3  col4
Date                                
2019-01-01   241  977.0   76.0   234
2019-01-02   123   78.0  432.0   321
2019-01-03   423    NaN    NaN   987

Another Dataframe, or even a Series, has the missing values for col2 and col3. How can I replace the NaN values with the values from df2?

df2 = pd.DataFrame({'col2': 111, 'col3': 222}, index=[pd.to_datetime('2019-1-3')]).rename_axis('Date')

which looks like:

            col2  col3
Date                  
2019-01-03   111   222

The final DataFrame I want should look like this:

            col1   col2   col3  col4
Date                                
2019-01-01   241  977.0   76.0   234
2019-01-02   123   78.0  432.0   321
2019-01-03   423    111    222   987

Upvotes: 1

Views: 76

Answers (2)

BENY
BENY

Reputation: 323226

Alternative combine_first

df2.combine_first(df)
Out[8]: 
             col1   col2   col3   col4
Date                                  
2019-01-01  241.0  977.0   76.0  234.0
2019-01-02  123.0   78.0  432.0  321.0
2019-01-03  423.0  111.0  222.0  987.0

Or update

df.update(df2)
df
Out[10]: 
            col1   col2   col3  col4
Date                                
2019-01-01   241  977.0   76.0   234
2019-01-02   123   78.0  432.0   321
2019-01-03   423  111.0  222.0   987

Upvotes: 3

ansev
ansev

Reputation: 30920

We can use DataFrame.fillna:

df=df.fillna(df2)
print(df)

            col1   col2   col3  col4
Date                                
2019-01-01   241  977.0   76.0   234
2019-01-02   123   78.0  432.0   321
2019-01-03   423  111.0  222.0   987

if you had a series by columns like the one obtained with df2.iloc[0] we can also do it:

my_serie=df2.iloc[0]
print(my_serie)
col2    111
col3    222
Name: 2019-01-03 00:00:00, dtype: int64

print(df.fillna(my_serie))
            col1   col2   col3  col4
Date                                
2019-01-01   241  977.0   76.0   234
2019-01-02   123   78.0  432.0   321
2019-01-03   423  111.0  222.0   987

Upvotes: 3

Related Questions