golanor
golanor

Reputation: 93

pandas df.fillna - filling NaNs after outer join with correct values

I have two dataframes, sharing some columns together.
I'm trying to:

1) Merge the two dataframes together, i.e. adding the columns which are different:

diff = df2[df2.columns.difference(df1.columns)]
merged = pd.merge(df1, diff, how='outer', sort=False, on='ID')

Up to here, everything works as expected.

2) Now, to replace the NaN values with the values of df2

merged = merged[~merged.index.duplicated(keep='first')]
merged.fillna(value=df2)

And it is here that I get:

pandas.core.indexes.base.InvalidIndexError

I don't have any duplicates, and I can't find any information as to what can cause this.

Upvotes: 2

Views: 2841

Answers (2)

golanor
golanor

Reputation: 93

The solution to this problem is to use a different method - combine_first() this way, each row with missing data is filled with data from the other dataframe, as can be seen here Merging together values within Series or DataFrame columns

Upvotes: 3

Venkatachalam
Venkatachalam

Reputation: 16966

In case, number of rows changes because of the merge, fillna sometimes cause error. Try the following!

merged.fillna(df2.groupby(level=0).transform("mean"))

related question

Upvotes: 0

Related Questions