Removing rows from dataset based on how it relates to previous row in R

Question

I am working with a large dataset of fish movement. Each row in the dataset represents a fish location with a particular timestamp, ID, etc. associated with that location. I have calculated distances between consecutive locations (typically every 2-3 seconds) with some code that creates the distance for a row by calculating distance between current XY position and previous XY position (i.e. distance = euclidean_distance(PosY, lag(PosY), PosX, lag(PosX)).

But a large handful of observations that begin each treatment period are separated by large gaps in time (i.e. 1-3 hours apart), resulting in some large distance calculations for those observations since the fish are often in completely different areas an hour later.

I want to remove the first observations for each time period since they produce typically large distance calculations, but the only way I know how to is to subset the dataset based on the new column of distance values as below:

df_final <- subset(df, subset = !(distance >= 2)) # Removes rows with distance values greater than or equal to 2.

Is there another way I can subset my dataframe so that the first observation for every hour (and for each fish ID) is removed so that I don't accidentally remove naturally high distance values?

Removing rows from dataset based on how it relates to previous row in R

Answers (0)

Related Questions