Speeding up loop over dataframes

Question

I have written the code given below. There are two Pandas dataframes: df contains columns timestamp_milli and pressure and df2 contains columns timestamp_milli and acceleration_z. Both dataframes have around 100'000 rows. In the code shown below I'm searching for each timestamp of each row of df the rows of df2 where the time difference lies within a range and is minimal.

Unfortunately the code is extremly slow. Moreover, I'm getting the following message originating from the line df_temp["timestamp_milli"] = df_temp["timestamp_milli"] - row["timestamp_milli"]:

SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead

How can I speedup the code and solve the warning?

acceleration = []
pressure = []

for index, row in df.iterrows():
    mask = (df2["timestamp_milli"] >= (row["timestamp_milli"] - 5)) & (df2["timestamp_milli"] <= (row["timestamp_milli"] + 5))
    df_temp = df2[mask]

    # Select closest point
    if len(df_temp) > 0:
        df_temp["timestamp_milli"] = df_temp["timestamp_milli"] - row["timestamp_milli"]
        df_temp["timestamp_milli"] = df_temp["timestamp_milli"].abs()

        df_temp = df_temp.loc[df_temp["timestamp_milli"] == df_temp["timestamp_milli"].min()]

        for index2, row2 in df_temp.iterrows():
            pressure.append(row["pressure"])
            acc = row2["acceleration_z"]
            acceleration.append(acc)

Madhur Yadav · Accepted Answer

I have faced a similar problem, using itertuples instead of iterrows shows significant reduction in time. why iterrows have issues. Hope this helps.

Speeding up loop over dataframes

Answers (1)

Related Questions