Milad-Irani
Milad-Irani

Reputation: 73

How to update pandas dataframe columns based on another dataframe faster?

I am using the below code to update a dataframe based on another one. However, it is dramatically slow. I am looking for a solution.

for inx, row in df1.iterrows():
    dfTmp = df2.loc[df2['KANR'].astype(str) == row['KANR']]
    if dfTmp.empty:
        continue

    if dfTmp.loc[dfTmp['STATUS'] == "F5"].empty is False:
        timestamp = "%s %s" % (dfTmp.loc[dfTmp['STATUS'].astype(str) == "F5"].iloc[0, ]["Date"],
                               dfTmp.loc[dfTmp['STATUS'].astype(str) == "F5"].iloc[0, ]["Time"])
        df1.set_value(inx, 'F5', timestamp)

Upvotes: 0

Views: 95

Answers (1)

akuiper
akuiper

Reputation: 214987

You can use merge, which is optimized for speed and will be much faster for this kind of match task, something like this, assuming you don't have duplicated date time for each KANR:

df2['F5'] = df2['Date'].astype(str) + " " + df2['Time'].astype(str)
to_join = df2.loc[df2['STATUS'].astype(str) == 'F5', ['F5', 'KANR']].groupby('KANR').head(1)
df1.merge(to_join, how='left', on = 'KANR')

Upvotes: 1

Related Questions