Reputation: 73
I am using the below code to update a dataframe based on another one. However, it is dramatically slow. I am looking for a solution.
for inx, row in df1.iterrows():
dfTmp = df2.loc[df2['KANR'].astype(str) == row['KANR']]
if dfTmp.empty:
continue
if dfTmp.loc[dfTmp['STATUS'] == "F5"].empty is False:
timestamp = "%s %s" % (dfTmp.loc[dfTmp['STATUS'].astype(str) == "F5"].iloc[0, ]["Date"],
dfTmp.loc[dfTmp['STATUS'].astype(str) == "F5"].iloc[0, ]["Time"])
df1.set_value(inx, 'F5', timestamp)
Upvotes: 0
Views: 95
Reputation: 214987
You can use merge
, which is optimized for speed and will be much faster for this kind of match task, something like this, assuming you don't have duplicated date time for each KANR
:
df2['F5'] = df2['Date'].astype(str) + " " + df2['Time'].astype(str)
to_join = df2.loc[df2['STATUS'].astype(str) == 'F5', ['F5', 'KANR']].groupby('KANR').head(1)
df1.merge(to_join, how='left', on = 'KANR')
Upvotes: 1