Meg.abytes
Meg.abytes

Reputation: 179

How to make a loop for code including pipes

I am fairly new to R code and am trying to avoid copy pasting the same line 20 times as I am currently trying to do this manually: I have a dataframe with 3 variables: date.time, Depth, ms (sample) :

 date.time               Depth      ms
 1: 2015-12-20 00:48:50 113.5  0.316666667
 2: 2015-12-20 01:25:50 156.0 -0.966666667
 3: 2015-12-20 01:26:50 170.5 -0.241666667
 4: 2015-12-20 01:27:50 215.5 -0.750000000
 5: 2015-12-20 01:28:50 276.5 -1.016666667
 6: 2015-12-20 01:29:50 373.0 -1.608333333
 7: 2015-12-20 01:30:50 453.0 -1.333333333
 8: 2015-12-20 01:31:50 500.0 -0.783333333
 9: 2015-12-20 01:35:50 512.0  0.241666667
10: 2015-12-20 03:53:50 285.0  0.058333333
11: 2015-12-20 03:54:50 355.0 -1.166666667
12: 2015-12-20 03:55:50 453.5 -1.641666667
12: 2015-12-20 03:57:50 526.0  0.000000000
14: 2015-12-21 15:01:50 449.5  0.016666667
15: 2015-12-21 15:02:50 467.5 -0.300000000
16: 2015-12-21 16:07:50 308.5  0.100000000
17: 2015-12-21 16:08:50 392.0 -1.391666667
18: 2015-12-21 16:09:50 491.0 -1.650000000
19: 2015-12-21 16:11:50 581.0  0.000000000
20: 2015-12-22 22:02:50 461.0  0.075000000
21: 2015-12-22 22:03:50 463.0 -0.033333333
22: 2015-12-22 22:04:50 466.0 -0.050000000
23: 2015-12-22 22:05:50 467.5 -0.025000000
24: 2015-12-22 22:06:50 468.0 -0.008333333
25: 2015-12-22 22:07:50 471.0 -0.050000000
26: 2015-12-22 22:08:50 472.5 -0.025000000
27: 2015-12-22 22:09:50 530.0 -0.958333333

I have manually gone through this to separate each dive by selecting the rows where a dive start and ends (for example:

d1<- df[c(1:9),]
d2<- df[c(10:13),]
d3<- df[c(14:20),]
d4<- df[c(21:27),]

and am then applying the following code to each new df (d1, d2, d3, d4) I am creating (below is an example of d1):

    d1<- newdf[c(1:19),]
d1$date.time <- as_datetime(d1$date.time)
str(d1)

d1 %>% 
  group_by(Ptt) %>%
  mutate(
    diffMin = difftime(date.time, lag(date.time,1, default = date.time[1] ), unit = "mins") %>% #calculate time diff of each row
      as.numeric() %>% #changes to numeric
      cumsum() #gets cumulative sum
  ) -> d1
d1$Divenumber <- as.character('1')

This is giving me the desired output of:

d1
         date.time           Depth     ms diffMin Divenumber
       <dttm>              <dbl>  <dbl>   <dbl> <chr>     
     1 2015-12-20 00:48:50  114.  0.317       0 1         
     2 2015-12-20 01:25:50  156  -0.967      37 1         
     3 2015-12-20 01:26:50  170. -0.242      38 1         
     4 2015-12-20 01:27:50  216. -0.75       39 1         
     5 2015-12-20 01:28:50  276. -1.02       40 1         
     6 2015-12-20 01:29:50  373  -1.61       41 1         
     7 2015-12-20 01:30:50  453  -1.33       42 1         
     8 2015-12-20 01:31:50  500  -0.783      43 1         
     9 2015-12-20 01:35:50  512   0.242      47 1         
    

d2
  

    date.time           Depth      ms diffMin Divenumber
      <dttm>              <dbl>   <dbl>   <dbl> <chr>     
    1 2015-12-20 03:53:50  285   0.0583       0 2         
    2 2015-12-20 03:54:50  355  -1.17         1 2         
    3 2015-12-20 03:55:50  454. -1.64         2 2         
    4 2015-12-20 03:57:50  526   0            4 2  

for each new df but as you can see this is quite a lot of copy pasting in order to get each new df and then bind them at the end. I am sure there is a faster way to do this but can't quite get it right after a few hours of trying. Can someone please help me do this (perhaps in some type of loop) that will allow me to loop through the entire dataset and and assign a new dive number to each new dive as well as the time difference from the start of that dive and the end of that dive in minutes? Also, would be great to not have to separate dives in the future manually, and can only think of creating some type of code using case_when lag and date.time to differentiate dives. But am happy for any other possible suggestions!

Here is a dput for a subset of my data:

structure(list(date.time = structure(c(1450572530, 1450574750, 
1450574810, 1450574870, 1450574930, 1450574990, 1450575050, 1450575110, 
1450575350, 1450583630, 1450583690, 1450583750, 1450583870, 
1450710110, 1450710170, 1450714070, 1450714130, 1450714190, 1450714310, 
1450821770, 1450821830, 1450821890, 1450821950, 1450822010, 1450822070, 
1450822130, 1450822190), class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
    Depth = c(113.5, 156, 170.5, 215.5, 276.5, 373, 453, 500, 
    512, 285, 355, 453.5, 526, 449.5, 467.5, 308.5, 392, 
    491, 581, 461, 463, 466, 467.5, 468, 471, 472.5, 530), ms = c(0.316666666666667, 
    -0.966666666666667, -0.241666666666667, -0.75, -1.01666666666667, 
    -1.60833333333333, -1.33333333333333, -0.783333333333333, 
    0.241666666666667, 0.0583333333333333, 
    -1.16666666666667, -1.64166666666667, 0, 0.0166666666666667, 
    -0.3, 0.1, -1.39166666666667, -1.65, 0, 0.075, -0.0333333333333333, 
    -0.05, -0.025, -0.00833333333333333, -0.05, -0.025, -0.958333333333333
    )), row.names = c(NA, -28L), class = c("data.table", "data.frame"
)

Thanks in advance

Upvotes: 2

Views: 221

Answers (3)

Meg.abytes
Meg.abytes

Reputation: 179

Performed above code posted by Ronak and then used pipes to group by dive and calculate cumulative dive time:

df <- df %>% 
  group_by(dive) %>%
  mutate(
    diffMin = difftime(date.time, lag(date.time,1, default = date.time[1] ), unit = "mins") %>% #calculate time diff of each row
      as.numeric() %>% #changes to numeric
      cumsum()) #gets cumulative sum

Upvotes: 0

Shibaprasad
Shibaprasad

Reputation: 1332

A different approach. I have used a simple while loop to do what you asked. And used the logic of dive as you said in the comment. Let me know, if you have any doubt.

#Load the data in df

#Create a list for the dive. Set the first element as 1, as it will be dive 1

dive <- c(1)

#Create a counter
dive_count <- 1

#Start the while loop from i =2, as the first one is automatically considered in dive 1

i <-2
while (i <= nrow(df)) {
  if (df$Depth[i]> df$Depth[i-1]){
    dive[i] <- dive_count
  }
  else{
    dive_count <- dive_count+1
    dive[i] <- dive_count
  }
  i<- i+1
}

df$dive <- dive

Check the final data frame

df

 date.time Depth           ms dive
1  2015-12-20 00:48:50 113.5  0.316666667    1
2  2015-12-20 01:25:50 156.0 -0.966666667    1
3  2015-12-20 01:26:50 170.5 -0.241666667    1
4  2015-12-20 01:27:50 215.5 -0.750000000    1
5  2015-12-20 01:28:50 276.5 -1.016666667    1
6  2015-12-20 01:29:50 373.0 -1.608333333    1
7  2015-12-20 01:30:50 453.0 -1.333333333    1
8  2015-12-20 01:31:50 500.0 -0.783333333    1
9  2015-12-20 01:35:50 512.0  0.241666667    1
10 2015-12-20 03:53:50 285.0  0.058333333    2
11 2015-12-20 03:54:50 355.0 -1.166666667    2
12 2015-12-20 03:55:50 453.5 -1.641666667    2
13 2015-12-20 03:57:50 526.0  0.000000000    2
14 2015-12-21 15:01:50 449.5  0.016666667    3
15 2015-12-21 15:02:50 467.5 -0.300000000    3
16 2015-12-21 16:07:50 308.5  0.100000000    4
17 2015-12-21 16:08:50 392.0 -1.391666667    4
18 2015-12-21 16:09:50 491.0 -1.650000000    4
19 2015-12-21 16:11:50 581.0  0.000000000    4
20 2015-12-22 22:02:50 461.0  0.075000000    5
21 2015-12-22 22:03:50 463.0 -0.033333333    5
22 2015-12-22 22:04:50 466.0 -0.050000000    5
23 2015-12-22 22:05:50 467.5 -0.025000000    5
24 2015-12-22 22:06:50 468.0 -0.008333333    5
25 2015-12-22 22:07:50 471.0 -0.050000000    5
26 2015-12-22 22:08:50 472.5 -0.025000000    5
27 2015-12-22 22:09:50 530.0 -0.958333333    5

Upvotes: 2

Ronak Shah
Ronak Shah

Reputation: 388862

Keeping the threshold as 2 hours you can create the dive column automatically by using cumsum as -

library(dplyr)

n_seconds <- 7200 #2hours

df <- df %>% 
       mutate(dive = cumsum(difftime(date.time, 
                     lag(date.time, default = first(date.time) - n_seconds - 1), 
                     units = 'secs') > n_seconds))
df

#             date.time Depth           ms dive
#1  2015-12-20 00:48:50 113.5  0.316666667    1
#2  2015-12-20 01:25:50 156.0 -0.966666667    1
#3  2015-12-20 01:26:50 170.5 -0.241666667    1
#4  2015-12-20 01:27:50 215.5 -0.750000000    1
#5  2015-12-20 01:28:50 276.5 -1.016666667    1
#6  2015-12-20 01:29:50 373.0 -1.608333333    1
#7  2015-12-20 01:30:50 453.0 -1.333333333    1
#8  2015-12-20 01:31:50 500.0 -0.783333333    1
#9  2015-12-20 01:35:50 512.0  0.241666667    1
#10 2015-12-20 03:53:50 285.0  0.058333333    2
#11 2015-12-20 03:54:50 355.0 -1.166666667    2
#12 2015-12-20 03:55:50 453.5 -1.641666667    2
#13 2015-12-20 03:57:50 526.0  0.000000000    2
#14 2015-12-21 15:01:50 449.5  0.016666667    3
#15 2015-12-21 15:02:50 467.5 -0.300000000    3
#16 2015-12-21 16:07:50 308.5  0.100000000    3
#17 2015-12-21 16:08:50 392.0 -1.391666667    3
#18 2015-12-21 16:09:50 491.0 -1.650000000    3
#19 2015-12-21 16:11:50 581.0  0.000000000    3
#20 2015-12-22 22:02:50 461.0  0.075000000    4
#21 2015-12-22 22:03:50 463.0 -0.033333333    4
#22 2015-12-22 22:04:50 466.0 -0.050000000    4
#23 2015-12-22 22:05:50 467.5 -0.025000000    4
#24 2015-12-22 22:06:50 468.0 -0.008333333    4
#25 2015-12-22 22:07:50 471.0 -0.050000000    4
#26 2015-12-22 22:08:50 472.5 -0.025000000    4
#27 2015-12-22 22:09:50 530.0 -0.958333333    4

You may change threshold as appropriate to your data, I chose 2 hours based on the sample provided.

Upvotes: 1

Related Questions