Pandas - Is there a better way to update column with another dataframe column

Question

I have a problem, for which I have a solution, but given the amount of data, its very slow. Just putting it out there to see if there is a better solution than this.

So I have two dataframes

df1

Group   Date    Value   X   Y
B   1/1/2020    3   9   4556
B   2/1/2020    4   28  43543
B   3/1/2020    5   7   232
A   1/1/2020    6   28  566
A   2/1/2020    7   47  342
A   3/1/2020    8   -6  546
C   1/1/2020    21  43  3434
C   2/1/2020    8   65  2323
C   3/1/2020    2   24  44

df2

Group   Date    Value
A   1/1/2020    1000
A   2/1/2020    900
A   3/1/2020    800

And I want to replace the "Value" column of df1 with "Value" column of df2 with matching "Date"

df1=df1.set_index([‘Group’,'Date'])
df2=df2.set_index(['Group','Date'])
df1.update(df2)
df1=df1.reset_index()

This gets me there, but the problem is when df1 gets too big, it takes a while. I am running this code in another program (Spotfire by TIBCO). Any ideas to improve efficiency? Thanks

norie · Accepted Answer

Assuming you want to join the two data frames based on Group and Date you could try using merge.

df1= df1.merge(df2, how='left', on=['Group', 'Date'])

df1['Value'] = df1['Value_y'].fillna(df1['Value_x'])

df1 = df1[['Group', 'Date', 'Value', 'X', 'Y']]

print(df1)

Pandas - Is there a better way to update column with another dataframe column

Answers (2)

Related Questions