Reputation: 175
I have a problem, for which I have a solution, but given the amount of data, its very slow. Just putting it out there to see if there is a better solution than this.
So I have two dataframes
df1
Group Date Value X Y
B 1/1/2020 3 9 4556
B 2/1/2020 4 28 43543
B 3/1/2020 5 7 232
A 1/1/2020 6 28 566
A 2/1/2020 7 47 342
A 3/1/2020 8 -6 546
C 1/1/2020 21 43 3434
C 2/1/2020 8 65 2323
C 3/1/2020 2 24 44
df2
Group Date Value
A 1/1/2020 1000
A 2/1/2020 900
A 3/1/2020 800
And I want to replace the "Value" column of df1 with "Value" column of df2 with matching "Date"
df1=df1.set_index([‘Group’,'Date'])
df2=df2.set_index(['Group','Date'])
df1.update(df2)
df1=df1.reset_index()
This gets me there, but the problem is when df1 gets too big, it takes a while. I am running this code in another program (Spotfire by TIBCO). Any ideas to improve efficiency? Thanks
Upvotes: 1
Views: 259
Reputation: 466
In case this approach is also helpful (since it looks like df2 may have come from some type of mapping), you could also use the map() function. First, create a dictionary or function that maps dates to values, given by df2, then apply it to df1 based on df1["Date"] and df1["Value"].
mapping = dict(zip(df2["Date"],df2["Value"))
df1["Value"] = df1["Date"].map(mapping)
If the mapping in df2 is different for each group, you could do the same thing as above with a slightly messier dictionary (tuples as keys). Then, you'll need to use apply() instead of map().
mapping = dict( zip( zip(df2["Group"],df2["Date"]), df2["Value"] )
df1["Value"] = df1.apply( lambda x: mapping[ x["Group"],x["Date"] ], axis=1 )
Upvotes: 1
Reputation: 9857
Assuming you want to join the two data frames based on Group and Date you could try using merge.
df1= df1.merge(df2, how='left', on=['Group', 'Date'])
df1['Value'] = df1['Value_y'].fillna(df1['Value_x'])
df1 = df1[['Group', 'Date', 'Value', 'X', 'Y']]
print(df1)
Upvotes: 3