Reputation: 1057
I have a df tracking movement of points each hour. I want to find the total distance traveled by that group/trial by adding the distance between the hourly coordinates, but I'm confusing myself with apply functions.
I want to say "in each group/trial, sum [distance(hour1-hou2), distance(hour2=hour3), distance(hour3-hour4)....] until current hour so on each line, I have a cumulative distance travelled value.
I've created a fake df below.
paths <- data.frame(matrix(nrow=80,ncol=5))
colnames(paths) <- c("trt","trial","hour","X","Y")
paths$trt <- rep(c("A","B","C","D"),each=20)
paths$trial <- rep(c(rep(1,times=10),rep(2,times=10)),times=4)
paths$hour <- rep(1:10,times=8)
paths[,4:5] <- runif(160,0,50)
#this shows the paths that I want to measure.
ggplot(data=paths,aes(x=X,y=Y,group=interaction(trt,trial),color=trt))+
geom_path()
I probably want to add a column paths$dist.traveled
to keep track each hour.
I think I could use apply
or maybe even aggregate
but I've been using PointDistance
to find the distances, so I'm a bit confused. I also would rather not do a loop inside a loop, because the real dataset is large.
Upvotes: 0
Views: 347
Reputation: 1136
Is this what you are trying to achieve?:
paths %>%
mutate(dist.traveled = sqrt((X-lag(X))^2 + (Y-lag(Y))^2))
trt trial hour X Y dist.traveled
<chr> <dbl> <int> <dbl> <dbl> <dbl>
1 A 1 1 11.2 26.9 NA
2 A 1 2 20.1 1.48 27.0
3 A 1 3 30.4 0.601 10.4
4 A 1 4 31.1 26.6 26.0
5 A 1 5 38.1 30.4 7.88
6 A 1 6 27.9 47.9 20.2
7 A 1 7 16.5 35.3 16.9
8 A 1 8 0.328 13.0 27.6
9 A 1 9 14.0 41.7 31.8
10 A 1 10 29.7 7.27 37.8
# ... with 70 more rows
paths$dist.travelled[which(paths$hour==1)] <- NA
paths %>%
group_by(trt)%>%
summarise(total_distance = sum(dist.traveled, na.rm = TRUE))
trt total_distance
<chr> <dbl>
1 A 492.
2 B 508.
3 C 479.
4 D 462.
I am adding the new column to calculate distances for each group, and them sum them up.
Upvotes: 1
Reputation: 228
Here's an answer that uses {dplyr}
:
library(dplyr)
paths %>%
arrange(trt, trial, hour) %>%
group_by(trt, trial) %>%
mutate(dist_travelled = sqrt((X - lag(X))^2 + (Y - lag(Y))^2)) %>%
mutate(total_dist = sum(dist_travelled, na.rm = TRUE)) %>%
ungroup()
If you wanted the total distance but grouped only by trt
and not trial
you would just remove that from the call to group_by()
.
Upvotes: 3