Jake L
Jake L

Reputation: 1057

Sum total distance by groups

I have a df tracking movement of points each hour. I want to find the total distance traveled by that group/trial by adding the distance between the hourly coordinates, but I'm confusing myself with apply functions.

I want to say "in each group/trial, sum [distance(hour1-hou2), distance(hour2=hour3), distance(hour3-hour4)....] until current hour so on each line, I have a cumulative distance travelled value.

I've created a fake df below.

 paths <- data.frame(matrix(nrow=80,ncol=5))
 colnames(paths) <- c("trt","trial","hour","X","Y")
 paths$trt <- rep(c("A","B","C","D"),each=20)
 paths$trial <- rep(c(rep(1,times=10),rep(2,times=10)),times=4)
 paths$hour <- rep(1:10,times=8)
 paths[,4:5] <- runif(160,0,50)

 #this shows the paths that I want to measure.
 ggplot(data=paths,aes(x=X,y=Y,group=interaction(trt,trial),color=trt))+
   geom_path()

I probably want to add a column paths$dist.traveled to keep track each hour.

I think I could use apply or maybe even aggregate but I've been using PointDistance to find the distances, so I'm a bit confused. I also would rather not do a loop inside a loop, because the real dataset is large.

Upvotes: 0

Views: 347

Answers (2)

denisafonin
denisafonin

Reputation: 1136

Is this what you are trying to achieve?:

paths %>%
  mutate(dist.traveled = sqrt((X-lag(X))^2 + (Y-lag(Y))^2))


   trt   trial  hour      X      Y dist.traveled
   <chr> <dbl> <int>  <dbl>  <dbl>         <dbl>
 1 A         1     1 11.2   26.9           NA   
 2 A         1     2 20.1    1.48          27.0 
 3 A         1     3 30.4    0.601         10.4 
 4 A         1     4 31.1   26.6           26.0 
 5 A         1     5 38.1   30.4            7.88
 6 A         1     6 27.9   47.9           20.2 
 7 A         1     7 16.5   35.3           16.9 
 8 A         1     8  0.328 13.0           27.6 
 9 A         1     9 14.0   41.7           31.8 
10 A         1    10 29.7    7.27          37.8 
# ... with 70 more rows


paths$dist.travelled[which(paths$hour==1)] <- NA

paths %>%
  group_by(trt)%>%
  summarise(total_distance = sum(dist.traveled, na.rm = TRUE))



trt   total_distance
  <chr>          <dbl>
1 A               492.
2 B               508.
3 C               479.
4 D               462.

I am adding the new column to calculate distances for each group, and them sum them up.

Upvotes: 1

Hamed
Hamed

Reputation: 228

Here's an answer that uses {dplyr}:

library(dplyr)
paths %>% 
    arrange(trt, trial, hour) %>% 
    group_by(trt, trial) %>% 
    mutate(dist_travelled = sqrt((X - lag(X))^2 + (Y - lag(Y))^2)) %>% 
    mutate(total_dist = sum(dist_travelled, na.rm = TRUE)) %>% 
    ungroup()

If you wanted the total distance but grouped only by trt and not trial you would just remove that from the call to group_by().

Upvotes: 3

Related Questions