Dark Knight
Dark Knight

Reputation: 175

Pandas - Is there a better way to update column with another dataframe column

I have a problem, for which I have a solution, but given the amount of data, its very slow. Just putting it out there to see if there is a better solution than this.

So I have two dataframes

df1

Group   Date    Value   X   Y
B   1/1/2020    3   9   4556
B   2/1/2020    4   28  43543
B   3/1/2020    5   7   232
A   1/1/2020    6   28  566
A   2/1/2020    7   47  342
A   3/1/2020    8   -6  546
C   1/1/2020    21  43  3434
C   2/1/2020    8   65  2323
C   3/1/2020    2   24  44

df2

Group   Date    Value
A   1/1/2020    1000
A   2/1/2020    900
A   3/1/2020    800

And I want to replace the "Value" column of df1 with "Value" column of df2 with matching "Date"

df1=df1.set_index([‘Group’,'Date'])
df2=df2.set_index(['Group','Date'])
df1.update(df2)
df1=df1.reset_index()

This gets me there, but the problem is when df1 gets too big, it takes a while. I am running this code in another program (Spotfire by TIBCO). Any ideas to improve efficiency? Thanks

Upvotes: 1

Views: 259

Answers (2)

Leo
Leo

Reputation: 466

In case this approach is also helpful (since it looks like df2 may have come from some type of mapping), you could also use the map() function. First, create a dictionary or function that maps dates to values, given by df2, then apply it to df1 based on df1["Date"] and df1["Value"].

mapping = dict(zip(df2["Date"],df2["Value"))
df1["Value"] = df1["Date"].map(mapping)

If the mapping in df2 is different for each group, you could do the same thing as above with a slightly messier dictionary (tuples as keys). Then, you'll need to use apply() instead of map().

mapping = dict( zip( zip(df2["Group"],df2["Date"]), df2["Value"] )
df1["Value"] = df1.apply( lambda x: mapping[ x["Group"],x["Date"] ], axis=1 )

Upvotes: 1

norie
norie

Reputation: 9857

Assuming you want to join the two data frames based on Group and Date you could try using merge.

df1= df1.merge(df2, how='left', on=['Group', 'Date'])

df1['Value'] = df1['Value_y'].fillna(df1['Value_x'])

df1 = df1[['Group', 'Date', 'Value', 'X', 'Y']]

print(df1)

Upvotes: 3

Related Questions