Reputation: 173

How to update a dataframe from a new dataframe that has an additional column?

In the help page for pandas.DataFrame.update they have the following example to update df with new_df:

df = pd.DataFrame({'A': [1, 2, 3],
                   'B': [400, 500, 600]})
new_df = pd.DataFrame({'B': [4, 5, 6],
                       'C': [7, 8, 9]})
df.update(new_df)
df
   A  B
0  1  4
1  2  5
2  3  6

What is the best way to get 'C' to show up as a new column in df using the update command? I could not figure it out, but if I create the new column in df using either of the following:

df['C'] = np.NaN
df['C'] = 0

that seems to populate it, but I'm wondering if there is a better way or some way to do it in the df.update.

Upvotes: 3

Answers (3)

SeaBean

Reputation: 23237

This answer is aimed at discussing the differences of DataFrame.update() with various similar functions, such as .combine_first() and .assign() discussed in this post.

An example is sometimes better than thousand words.

df = pd.DataFrame({'A': [1, 2, 3],
                   'B': [400, 500, np.nan]})
new_df = pd.DataFrame({'B': [4, np.nan, 6],
                       'C': [7, 8, 9]})

df
   A      B
0  1  400.0
1  2  500.0
2  3    NaN

new_df
     B  C
0  4.0  7
1  NaN  8
2  6.0  9

Using `.update()`:

df.update(new_df)
   A      B
0  1    4.0
1  2  500.0              <==  500 from column B of df is NOT replaced by NaN from new_df
2  3    6.0

Note also that df after the update (in place) still contains 2 columns only.

Using `.combine_first()`

df.combine_first(new_df)
   A      B  C
0  1  400.0  7
1  2  500.0  8
2  3    6.0  9           <==  Only NaN entry in column B of df got replaced

Note that we have 3 columns in the result here.

Next, let's try reversing the subject and object of the call:

new_df.combine_first(df)
   A      B  C
0  1    4.0  7
1  2  500.0  8           <==  Only NaN entry in column B of new_df got replaced
2  3    6.0  9

Note that we have 3 columns in the result here. Different from the result of .update()

This is among all the nearest to the result of .update() but still with one column more. Moreover, we have to reverse the subject and object in the call: we want to update df but have to call using new_df to get similar result. This is not so intuitive.

Using `.assign()`

df.assign(**new_df)
   A    B  C
0  1  4.0  7
1  2  NaN  8             <==  500 from column B of df got replaced by NaN from new_df
2  3  6.0  9

Note that we have 3 columns in the result here. Different from the result of .update()

How to get column `C` to show up as a new column in `df` using `.update()` command?

Since only values at matching index/column labels are updated, we can achieve this by using .reindex() on the union of both row and column indexes before using .update() command, as follows:

df = df.reindex(index=df.index.union(new_df.index), 
                columns=df.columns.union(new_df.columns))
df.update(new_df)
   A      B    C
0  1    4.0  7.0
1  2  500.0  8.0
2  3    6.0  9.0

This works better than manually creating missing columns and rows in df for there could be multiple row and column indexes to add for large datasets.

Upvotes: 1

RavinderSingh13

Reputation: 133770

There is another way where we could use combine_first and make sure right DataFrames are passed to it. If you pass it correct manner then you will get expected output.

new_df.combine_first(df)
    A   B   C
0   1   4   7
1   2   5   8
2   3   6   9

Upvotes: 4

Ch3steR

Reputation: 20689

df.update Aligns on indices. You can use df.assign to overwrite the previous value and adding new columns too.

df.assign(**new_df)

   A  B  C
0  1  4  7
1  2  5  8
2  3  6  9

Upvotes: 4

How to update a dataframe from a new dataframe that has an additional column?

Answers (3)

Using .update():

Using .combine_first()

Using .assign()

How to get column C to show up as a new column in df using .update() command?

Related Questions

Using `.update()`:

Using `.combine_first()`

Using `.assign()`

How to get column `C` to show up as a new column in `df` using `.update()` command?