Reputation: 6290
I have written the code given below. There are two Pandas dataframes: df
contains columns timestamp_milli
and pressure
and df2
contains columns timestamp_milli
and acceleration_z
. Both dataframes have around 100'000 rows. In the code shown below I'm searching for each timestamp of each row of df
the rows of df2
where the time difference lies within a range and is minimal.
Unfortunately the code is extremly slow. Moreover, I'm getting the following message originating from the line df_temp["timestamp_milli"] = df_temp["timestamp_milli"] - row["timestamp_milli"]
:
SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead
How can I speedup the code and solve the warning?
acceleration = []
pressure = []
for index, row in df.iterrows():
mask = (df2["timestamp_milli"] >= (row["timestamp_milli"] - 5)) & (df2["timestamp_milli"] <= (row["timestamp_milli"] + 5))
df_temp = df2[mask]
# Select closest point
if len(df_temp) > 0:
df_temp["timestamp_milli"] = df_temp["timestamp_milli"] - row["timestamp_milli"]
df_temp["timestamp_milli"] = df_temp["timestamp_milli"].abs()
df_temp = df_temp.loc[df_temp["timestamp_milli"] == df_temp["timestamp_milli"].min()]
for index2, row2 in df_temp.iterrows():
pressure.append(row["pressure"])
acc = row2["acceleration_z"]
acceleration.append(acc)
Upvotes: 0
Views: 425
Reputation: 723
I have faced a similar problem, using itertuples instead of iterrows shows significant reduction in time. why iterrows have issues. Hope this helps.
Upvotes: 2