Reputation: 555
I have a master Dataframe (df_a) with columns a-b-c-d-e. I have a second Dataframe (df_b) with new info that updates the values in columns a-b-c.
Minimal reproducible code:
df_a = pd.DataFrame(data={
'a': [1, 0, 0, 0, 0, 0],
'b': [2, 0, 0, 0, 0, 0],
'c': [3, 0, 0, 0, 0, 0],
'd': ['foo', 'foo', 'foo', 'foo', 'foo', 'foo'],
'e': ['bar','bar','bar','bar','bar','bar']
})
df_b = pd.DataFrame(data={
'a': [1, 0, 2, 0, 0, 0],
'b': [2, 0, 2, 0, 0, 0],
'c': [3, 0, 2, 0, 0, 0]
})
df_a:
a b c d e
0 1 2 3 foo bar
1 0 0 0 foo bar
2 0 0 0 foo bar
3 0 0 0 foo bar
4 0 0 0 foo bar
5 0 0 0 foo bar
df_b:
a b c
0 1 2 3
1 0 0 0
2 2 2 2
3 0 0 0
4 0 0 0
5 0 0 0
Desired output:
df_a_new:
a b c d e
0 1 2 3 foo bar
1 0 0 0 foo bar
2 2 2 2 foo bar
3 0 0 0 foo bar
4 0 0 0 foo bar
5 0 0 0 foo bar
The indexes will always be the same, so I typically would just use map() for single columns, but in my actual dataset there are 50 columns I have to update and it would not be as efficient.
Upvotes: 1
Views: 67
Reputation: 150745
This is combine_first
, which returns a new dataframe:
df_b.combine_first(df_a)
Or inplace change with update
:
df_a.update(df_b)
Output:
a b c d e
0 1 2 3 foo bar
1 0 0 0 foo bar
2 2 2 2 foo bar
3 0 0 0 foo bar
4 0 0 0 foo bar
5 0 0 0 foo bar
Upvotes: 1
Reputation: 19322
Just use an OR |
operator between the 2 dataset. df_b|df_a[df_b.columns]
basically considers 0 as False and returns non zero values for rows (if equivalent rows have values, then it prioritizes the first) -
df_a[df_b.columns] = df_a[df_b.columns] | df_b
print(df_a)
a b c d e
0 1 2 3 foo bar
1 0 0 0 foo bar
2 2 2 2 foo bar
3 0 0 0 foo bar
4 0 0 0 foo bar
5 0 0 0 foo bar
Upvotes: 3
Reputation: 24314
You can make use of concat()
method:-
pd.concat((df_b,df_a[df_a.columns[3:]]),axis=1)
Here df_a.columns[3:]
gives Index(['d', 'e'])
Upvotes: 2
Reputation: 75080
A generic solution would look like :
Either find the difference of columns from df_a
and df_b
then assign them to df_b
df_b.assign(**df_a.loc[:,df_a.columns.difference(df_b.columns,sort=False)])
Or another way:
df_b.combine_first(df_a)
Upvotes: 4