geistmate
geistmate

Reputation: 555

Pandas How to replace all rows in certain columns without affecting the others?

I have a master Dataframe (df_a) with columns a-b-c-d-e. I have a second Dataframe (df_b) with new info that updates the values in columns a-b-c.

Minimal reproducible code:

df_a = pd.DataFrame(data={
    'a': [1, 0, 0, 0, 0, 0],
    'b': [2, 0, 0, 0, 0, 0], 
    'c': [3, 0, 0, 0, 0, 0],
    'd': ['foo', 'foo', 'foo', 'foo', 'foo', 'foo'],
    'e': ['bar','bar','bar','bar','bar','bar']
})

df_b = pd.DataFrame(data={
    'a': [1, 0, 2, 0, 0, 0],
    'b': [2, 0, 2, 0, 0, 0], 
    'c': [3, 0, 2, 0, 0, 0]
})

df_a:
    a   b   c   d   e
0   1   2   3   foo bar
1   0   0   0   foo bar
2   0   0   0   foo bar
3   0   0   0   foo bar
4   0   0   0   foo bar
5   0   0   0   foo bar

df_b:
    a   b   c
0   1   2   3
1   0   0   0
2   2   2   2
3   0   0   0
4   0   0   0
5   0   0   0

Desired output: 
df_a_new:
    a   b   c   d   e
0   1   2   3   foo bar
1   0   0   0   foo bar
2   2   2   2   foo bar
3   0   0   0   foo bar
4   0   0   0   foo bar
5   0   0   0   foo bar

The indexes will always be the same, so I typically would just use map() for single columns, but in my actual dataset there are 50 columns I have to update and it would not be as efficient.

Upvotes: 1

Views: 67

Answers (5)

Quang Hoang
Quang Hoang

Reputation: 150745

This is combine_first, which returns a new dataframe:

df_b.combine_first(df_a)

Or inplace change with update:

df_a.update(df_b)

Output:

   a  b  c    d    e
0  1  2  3  foo  bar
1  0  0  0  foo  bar
2  2  2  2  foo  bar
3  0  0  0  foo  bar
4  0  0  0  foo  bar
5  0  0  0  foo  bar

Upvotes: 1

Niru
Niru

Reputation: 21

df_a_new = df_a.iloc[:]
df_a_new.iloc[:,:3] = df_b.iloc[:,:3]

Upvotes: 1

Akshay Sehgal
Akshay Sehgal

Reputation: 19322

Just use an OR | operator between the 2 dataset. df_b|df_a[df_b.columns] basically considers 0 as False and returns non zero values for rows (if equivalent rows have values, then it prioritizes the first) -

df_a[df_b.columns] = df_a[df_b.columns] | df_b
print(df_a)
   a  b  c    d    e
0  1  2  3  foo  bar
1  0  0  0  foo  bar
2  2  2  2  foo  bar
3  0  0  0  foo  bar
4  0  0  0  foo  bar
5  0  0  0  foo  bar

Upvotes: 3

Anurag Dabas
Anurag Dabas

Reputation: 24314

You can make use of concat() method:-

pd.concat((df_b,df_a[df_a.columns[3:]]),axis=1)

Here df_a.columns[3:] gives Index(['d', 'e'])

Upvotes: 2

anky
anky

Reputation: 75080

A generic solution would look like :

Either find the difference of columns from df_a and df_b then assign them to df_b

df_b.assign(**df_a.loc[:,df_a.columns.difference(df_b.columns,sort=False)])

Or another way:

df_b.combine_first(df_a)

Upvotes: 4

Related Questions