samarth bhargava
samarth bhargava

Reputation: 31

Replace a subset of pandas data frame with another data frame

I have a data frame(DF1) with 100 columns.( one of the column is ID) I have one more data frame(DF2) with 30 columns.( one column is ID) I have to update the first 30 columns of the data frame(DF1) with the values in second data frame (DF2) keeping the rest of the values in the remaining columns of first data frame (DF1) intact.

update the first 30 column value in DF1 out of the 100 columns when the ID in second data frame (DF2) is present in first data frame (DF1).

Upvotes: 1

Views: 906

Answers (1)

Code Different
Code Different

Reputation: 93181

I tested this on Python 3.7 but I see no reason for it not to work on 2.7:

joined = df1.reset_index() \
            [['index', 'ID']] \
            .merge(df2, on='ID')
df1.loc[joined['index'], df1.columns[:30]] = joined.drop(columns=['index', 'ID'])

This assumes that df2 doesn't have a column called index or the merge will fail saying duplicate key with suffix.

Here a slow-motion of its inner workings:

  1. df1.reset_index() returns a dataframe same as df1 but with an additional column: index
  2. [['index', 'ID']] extracts a dataframe containing just these 2 columns from the dataframe in #1
  3. .merge(...) merges with df2 , matching on ID . The result (joined) is a dataframe with 32 columns: index, ID and the original 30 columns of df2.
  4. df1.loc[<row_indexes>, <column_names>] = <another_dataframe> mean you want to replace at those particular cells with data from another_dataframe. Since joined has 32 columns, we need to drop the extra 2 (index and ID)

Upvotes: 1

Related Questions