Reputation: 2085
Let's assume I have two dataframes:
df1 = pd.DataFrame({'A': [0, 1, 2, 3],
'B': [300, 400, 500, 600],
'C': [6, 7, 8, 9]})
df1.set_index('A', inplace=True)
df2 = pd.DataFrame({'A': [2, 3],
'B': [433, 555],
'C': [99, 99],
'D': [1, 2]})
df2.set_index('A', inplace=True)
I want to replace rows in df1
with rows in df2
based on index. My first attempts were:
df1.update(df2)
df1.loc[df2.index, :] = df2
However, it only replaces columns that excisted in df1
and doesn't add D
column from df2
. How can I add D
column as well with NaN
values for other rows?
Please note that my original data frames have hundreds of columns, and original df2
have many additional columns so any manual definitions of columns to be used/created will be enormously cumbersome.
Upvotes: 2
Views: 184
Reputation: 748
You can use the combine function as follows.
df_new = df1.combine(df2, func=lambda s1, s2: s2.fillna(s1))
This is very similar to the combine_first method mentioned in a previous answer but gives you a bit more freedom as you can write the function you want to apply yourself.
Upvotes: 1
Reputation: 17322
you can use:
df1.update(df2)
cols = df2.columns[~ df2.columns.isin(df1.columns)]
mask = df2.index.isin(df1.index)
df1[cols] = df2[cols][mask]
df1
output:
without pandas.DataFrame.update
:
df1_mask_index = df1.index.isin(df2.index)
df2_mask_index = df2.index.isin(df1.index)
same_cols = df1.columns[df1.columns.isin(df2.columns)]
diff_cols = df2.columns[~df2.columns.isin(df1.columns)]
df1[df1_mask_index] = df2[same_cols][df2_mask_index]
df1[diff_cols] = df2[diff_cols][df2_mask_index]
df1
output:
Upvotes: 0
Reputation: 153460
Let's use pd.DataFrame.combine_first
:
df2.combine_first(df1)
Output:
B C D
A
0 300.0 6.0 NaN
1 400.0 7.0 NaN
2 433.0 99.0 1.0
3 555.0 99.0 2.0
Upvotes: 3
Reputation: 862481
I think you need DataFrame.align
with DataFrame.update
:
df1 = pd.DataFrame({'A': [0, 1, 2, 3], 'B': [300, 400, 500, 600], 'E': [6, 7, 8, 9]})
df1.set_index('A', inplace=True)
df2 = pd.DataFrame({'A': [2, 5], 'B': [433, 555], 'C': [99, 99], 'D': [1, 2]})
df2.set_index('A', inplace=True)
print (df1)
B E
A
0 300 6
1 400 7
2 500 8
3 600 9
print (df2)
B C D
A
2 433 99 1
5 555 99 2
df1, df2 = df1.align(df2)
print (df1)
B C D E
A
0 300.0 NaN NaN 6.0
1 400.0 NaN NaN 7.0
2 500.0 NaN NaN 8.0
3 600.0 NaN NaN 9.0
5 NaN NaN NaN NaN
print (df2)
B C D E
A
0 NaN NaN NaN NaN
1 NaN NaN NaN NaN
2 433.0 99.0 1.0 NaN
3 NaN NaN NaN NaN
5 555.0 99.0 2.0 NaN
df1.update(df2)
print (df1)
B C D E
A
0 300.0 NaN NaN 6.0
1 400.0 NaN NaN 7.0
2 433.0 99.0 1.0 8.0
3 600.0 NaN NaN 9.0
5 555.0 99.0 2.0 NaN
EDIT: Because sorting problem is possible change align
to DataFrame.reindex
by Index.union
of columns and if necessary also by indexes:
df1 = pd.DataFrame({'E': [0, 1, 2, 3], 'A': [300, 400, 500, 600], 'C': [6, 7, 8, 9]})
df1.set_index('A', inplace=True)
df2 = pd.DataFrame({'A': [2, 5], 'C': [433, 555], 'B': [99, 99], 'D': [1, 2]})
df2.set_index('A', inplace=True)
idx = df1.index.union(df2.index, sort=False)
cols = df1.columns.union(df2.columns, sort=False)
df1 = df1.reindex(index=idx,columns=cols)
df2 = df2.reindex(index=idx,columns=cols)
print (df1)
E C B D
A
300 0.0 6.0 NaN NaN
400 1.0 7.0 NaN NaN
500 2.0 8.0 NaN NaN
600 3.0 9.0 NaN NaN
2 NaN NaN NaN NaN
5 NaN NaN NaN NaN
print (df2)
E C B D
A
300 NaN NaN NaN NaN
400 NaN NaN NaN NaN
500 NaN NaN NaN NaN
600 NaN NaN NaN NaN
2 NaN 433.0 99.0 1.0
5 NaN 555.0 99.0 2.0
df1.update(df2)
print (df1)
E C B D
A
300 0.0 6.0 NaN NaN
400 1.0 7.0 NaN NaN
500 2.0 8.0 NaN NaN
600 3.0 9.0 NaN NaN
2 NaN 433.0 99.0 1.0
5 NaN 555.0 99.0 2.0
Upvotes: 0