Reputation: 13
I am working with a large dataset of fish movement. Each row in the dataset represents a fish location with a particular timestamp, ID, etc. associated with that location. I have calculated distances between consecutive locations (typically every 2-3 seconds) with some code that creates the distance for a row by calculating distance between current XY position and previous XY position (i.e. distance = euclidean_distance(PosY, lag(PosY), PosX, lag(PosX)
).
But a large handful of observations that begin each treatment period are separated by large gaps in time (i.e. 1-3 hours apart), resulting in some large distance calculations for those observations since the fish are often in completely different areas an hour later.
I want to remove the first observations for each time period since they produce typically large distance calculations, but the only way I know how to is to subset the dataset based on the new column of distance values as below:
df_final <- subset(df, subset = !(distance >= 2)) # Removes rows with distance values greater than or equal to 2.
Is there another way I can subset my dataframe so that the first observation for every hour (and for each fish ID) is removed so that I don't accidentally remove naturally high distance values?
Upvotes: 0
Views: 51