How to update pandas dataframe columns based on another dataframe faster?

Question

I am using the below code to update a dataframe based on another one. However, it is dramatically slow. I am looking for a solution.

for inx, row in df1.iterrows():
    dfTmp = df2.loc[df2['KANR'].astype(str) == row['KANR']]
    if dfTmp.empty:
        continue

    if dfTmp.loc[dfTmp['STATUS'] == "F5"].empty is False:
        timestamp = "%s %s" % (dfTmp.loc[dfTmp['STATUS'].astype(str) == "F5"].iloc[0, ]["Date"],
                               dfTmp.loc[dfTmp['STATUS'].astype(str) == "F5"].iloc[0, ]["Time"])
        df1.set_value(inx, 'F5', timestamp)

akuiper · Accepted Answer

You can use merge, which is optimized for speed and will be much faster for this kind of match task, something like this, assuming you don't have duplicated date time for each KANR:

df2['F5'] = df2['Date'].astype(str) + " " + df2['Time'].astype(str)
to_join = df2.loc[df2['STATUS'].astype(str) == 'F5', ['F5', 'KANR']].groupby('KANR').head(1)
df1.merge(to_join, how='left', on = 'KANR')

How to update pandas dataframe columns based on another dataframe faster?

Answers (1)

Related Questions