Reputation: 31
I have a data frame(DF1) with 100 columns.( one of the column is ID) I have one more data frame(DF2) with 30 columns.( one column is ID) I have to update the first 30 columns of the data frame(DF1) with the values in second data frame (DF2) keeping the rest of the values in the remaining columns of first data frame (DF1) intact.
update the first 30 column value in DF1 out of the 100 columns when the ID in second data frame (DF2) is present in first data frame (DF1).
Upvotes: 1
Views: 906
Reputation: 93181
I tested this on Python 3.7 but I see no reason for it not to work on 2.7:
joined = df1.reset_index() \
[['index', 'ID']] \
.merge(df2, on='ID')
df1.loc[joined['index'], df1.columns[:30]] = joined.drop(columns=['index', 'ID'])
This assumes that df2
doesn't have a column called index
or the merge
will fail saying duplicate key with suffix.
Here a slow-motion of its inner workings:
df1.reset_index()
returns a dataframe same as df1
but with an additional column: index
[['index', 'ID']]
extracts a dataframe containing just these 2 columns from the dataframe in #1.merge(...)
merges with df2
, matching on ID
. The result (joined
) is a dataframe with 32 columns: index
, ID
and the original 30 columns of df2
.df1.loc[<row_indexes>, <column_names>] = <another_dataframe>
mean you want to replace at those particular cells with data from another_dataframe
. Since joined
has 32 columns, we need to drop the extra 2 (index
and ID
) Upvotes: 1