Reputation: 179
I am fairly new to R code and am trying to avoid copy pasting the same line 20 times as I am currently trying to do this manually: I have a dataframe with 3 variables: date.time, Depth, ms (sample) :
date.time Depth ms
1: 2015-12-20 00:48:50 113.5 0.316666667
2: 2015-12-20 01:25:50 156.0 -0.966666667
3: 2015-12-20 01:26:50 170.5 -0.241666667
4: 2015-12-20 01:27:50 215.5 -0.750000000
5: 2015-12-20 01:28:50 276.5 -1.016666667
6: 2015-12-20 01:29:50 373.0 -1.608333333
7: 2015-12-20 01:30:50 453.0 -1.333333333
8: 2015-12-20 01:31:50 500.0 -0.783333333
9: 2015-12-20 01:35:50 512.0 0.241666667
10: 2015-12-20 03:53:50 285.0 0.058333333
11: 2015-12-20 03:54:50 355.0 -1.166666667
12: 2015-12-20 03:55:50 453.5 -1.641666667
12: 2015-12-20 03:57:50 526.0 0.000000000
14: 2015-12-21 15:01:50 449.5 0.016666667
15: 2015-12-21 15:02:50 467.5 -0.300000000
16: 2015-12-21 16:07:50 308.5 0.100000000
17: 2015-12-21 16:08:50 392.0 -1.391666667
18: 2015-12-21 16:09:50 491.0 -1.650000000
19: 2015-12-21 16:11:50 581.0 0.000000000
20: 2015-12-22 22:02:50 461.0 0.075000000
21: 2015-12-22 22:03:50 463.0 -0.033333333
22: 2015-12-22 22:04:50 466.0 -0.050000000
23: 2015-12-22 22:05:50 467.5 -0.025000000
24: 2015-12-22 22:06:50 468.0 -0.008333333
25: 2015-12-22 22:07:50 471.0 -0.050000000
26: 2015-12-22 22:08:50 472.5 -0.025000000
27: 2015-12-22 22:09:50 530.0 -0.958333333
I have manually gone through this to separate each dive by selecting the rows where a dive start and ends (for example:
d1<- df[c(1:9),]
d2<- df[c(10:13),]
d3<- df[c(14:20),]
d4<- df[c(21:27),]
and am then applying the following code to each new df (d1, d2, d3, d4) I am creating (below is an example of d1):
d1<- newdf[c(1:19),]
d1$date.time <- as_datetime(d1$date.time)
str(d1)
d1 %>%
group_by(Ptt) %>%
mutate(
diffMin = difftime(date.time, lag(date.time,1, default = date.time[1] ), unit = "mins") %>% #calculate time diff of each row
as.numeric() %>% #changes to numeric
cumsum() #gets cumulative sum
) -> d1
d1$Divenumber <- as.character('1')
This is giving me the desired output of:
d1
date.time Depth ms diffMin Divenumber
<dttm> <dbl> <dbl> <dbl> <chr>
1 2015-12-20 00:48:50 114. 0.317 0 1
2 2015-12-20 01:25:50 156 -0.967 37 1
3 2015-12-20 01:26:50 170. -0.242 38 1
4 2015-12-20 01:27:50 216. -0.75 39 1
5 2015-12-20 01:28:50 276. -1.02 40 1
6 2015-12-20 01:29:50 373 -1.61 41 1
7 2015-12-20 01:30:50 453 -1.33 42 1
8 2015-12-20 01:31:50 500 -0.783 43 1
9 2015-12-20 01:35:50 512 0.242 47 1
d2
date.time Depth ms diffMin Divenumber
<dttm> <dbl> <dbl> <dbl> <chr>
1 2015-12-20 03:53:50 285 0.0583 0 2
2 2015-12-20 03:54:50 355 -1.17 1 2
3 2015-12-20 03:55:50 454. -1.64 2 2
4 2015-12-20 03:57:50 526 0 4 2
for each new df but as you can see this is quite a lot of copy pasting in order to get each new df and then bind them at the end. I am sure there is a faster way to do this but can't quite get it right after a few hours of trying. Can someone please help me do this (perhaps in some type of loop) that will allow me to loop through the entire dataset and and assign a new dive number to each new dive as well as the time difference from the start of that dive and the end of that dive in minutes? Also, would be great to not have to separate dives in the future manually, and can only think of creating some type of code using case_when
lag
and date.time
to differentiate dives. But am happy for any other possible suggestions!
Here is a dput for a subset of my data:
structure(list(date.time = structure(c(1450572530, 1450574750,
1450574810, 1450574870, 1450574930, 1450574990, 1450575050, 1450575110,
1450575350, 1450583630, 1450583690, 1450583750, 1450583870,
1450710110, 1450710170, 1450714070, 1450714130, 1450714190, 1450714310,
1450821770, 1450821830, 1450821890, 1450821950, 1450822010, 1450822070,
1450822130, 1450822190), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
Depth = c(113.5, 156, 170.5, 215.5, 276.5, 373, 453, 500,
512, 285, 355, 453.5, 526, 449.5, 467.5, 308.5, 392,
491, 581, 461, 463, 466, 467.5, 468, 471, 472.5, 530), ms = c(0.316666666666667,
-0.966666666666667, -0.241666666666667, -0.75, -1.01666666666667,
-1.60833333333333, -1.33333333333333, -0.783333333333333,
0.241666666666667, 0.0583333333333333,
-1.16666666666667, -1.64166666666667, 0, 0.0166666666666667,
-0.3, 0.1, -1.39166666666667, -1.65, 0, 0.075, -0.0333333333333333,
-0.05, -0.025, -0.00833333333333333, -0.05, -0.025, -0.958333333333333
)), row.names = c(NA, -28L), class = c("data.table", "data.frame"
)
Thanks in advance
Upvotes: 2
Views: 221
Reputation: 179
Performed above code posted by Ronak and then used pipes to group by dive and calculate cumulative dive time:
df <- df %>%
group_by(dive) %>%
mutate(
diffMin = difftime(date.time, lag(date.time,1, default = date.time[1] ), unit = "mins") %>% #calculate time diff of each row
as.numeric() %>% #changes to numeric
cumsum()) #gets cumulative sum
Upvotes: 0
Reputation: 1332
A different approach. I have used a simple while loop to do what you asked. And used the logic of dive as you said in the comment. Let me know, if you have any doubt.
#Load the data in df
#Create a list for the dive. Set the first element as 1, as it will be dive 1
dive <- c(1)
#Create a counter
dive_count <- 1
#Start the while loop from i =2, as the first one is automatically considered in dive 1
i <-2
while (i <= nrow(df)) {
if (df$Depth[i]> df$Depth[i-1]){
dive[i] <- dive_count
}
else{
dive_count <- dive_count+1
dive[i] <- dive_count
}
i<- i+1
}
df$dive <- dive
Check the final data frame
df
date.time Depth ms dive
1 2015-12-20 00:48:50 113.5 0.316666667 1
2 2015-12-20 01:25:50 156.0 -0.966666667 1
3 2015-12-20 01:26:50 170.5 -0.241666667 1
4 2015-12-20 01:27:50 215.5 -0.750000000 1
5 2015-12-20 01:28:50 276.5 -1.016666667 1
6 2015-12-20 01:29:50 373.0 -1.608333333 1
7 2015-12-20 01:30:50 453.0 -1.333333333 1
8 2015-12-20 01:31:50 500.0 -0.783333333 1
9 2015-12-20 01:35:50 512.0 0.241666667 1
10 2015-12-20 03:53:50 285.0 0.058333333 2
11 2015-12-20 03:54:50 355.0 -1.166666667 2
12 2015-12-20 03:55:50 453.5 -1.641666667 2
13 2015-12-20 03:57:50 526.0 0.000000000 2
14 2015-12-21 15:01:50 449.5 0.016666667 3
15 2015-12-21 15:02:50 467.5 -0.300000000 3
16 2015-12-21 16:07:50 308.5 0.100000000 4
17 2015-12-21 16:08:50 392.0 -1.391666667 4
18 2015-12-21 16:09:50 491.0 -1.650000000 4
19 2015-12-21 16:11:50 581.0 0.000000000 4
20 2015-12-22 22:02:50 461.0 0.075000000 5
21 2015-12-22 22:03:50 463.0 -0.033333333 5
22 2015-12-22 22:04:50 466.0 -0.050000000 5
23 2015-12-22 22:05:50 467.5 -0.025000000 5
24 2015-12-22 22:06:50 468.0 -0.008333333 5
25 2015-12-22 22:07:50 471.0 -0.050000000 5
26 2015-12-22 22:08:50 472.5 -0.025000000 5
27 2015-12-22 22:09:50 530.0 -0.958333333 5
Upvotes: 2
Reputation: 388862
Keeping the threshold as 2 hours you can create the dive
column automatically by using cumsum
as -
library(dplyr)
n_seconds <- 7200 #2hours
df <- df %>%
mutate(dive = cumsum(difftime(date.time,
lag(date.time, default = first(date.time) - n_seconds - 1),
units = 'secs') > n_seconds))
df
# date.time Depth ms dive
#1 2015-12-20 00:48:50 113.5 0.316666667 1
#2 2015-12-20 01:25:50 156.0 -0.966666667 1
#3 2015-12-20 01:26:50 170.5 -0.241666667 1
#4 2015-12-20 01:27:50 215.5 -0.750000000 1
#5 2015-12-20 01:28:50 276.5 -1.016666667 1
#6 2015-12-20 01:29:50 373.0 -1.608333333 1
#7 2015-12-20 01:30:50 453.0 -1.333333333 1
#8 2015-12-20 01:31:50 500.0 -0.783333333 1
#9 2015-12-20 01:35:50 512.0 0.241666667 1
#10 2015-12-20 03:53:50 285.0 0.058333333 2
#11 2015-12-20 03:54:50 355.0 -1.166666667 2
#12 2015-12-20 03:55:50 453.5 -1.641666667 2
#13 2015-12-20 03:57:50 526.0 0.000000000 2
#14 2015-12-21 15:01:50 449.5 0.016666667 3
#15 2015-12-21 15:02:50 467.5 -0.300000000 3
#16 2015-12-21 16:07:50 308.5 0.100000000 3
#17 2015-12-21 16:08:50 392.0 -1.391666667 3
#18 2015-12-21 16:09:50 491.0 -1.650000000 3
#19 2015-12-21 16:11:50 581.0 0.000000000 3
#20 2015-12-22 22:02:50 461.0 0.075000000 4
#21 2015-12-22 22:03:50 463.0 -0.033333333 4
#22 2015-12-22 22:04:50 466.0 -0.050000000 4
#23 2015-12-22 22:05:50 467.5 -0.025000000 4
#24 2015-12-22 22:06:50 468.0 -0.008333333 4
#25 2015-12-22 22:07:50 471.0 -0.050000000 4
#26 2015-12-22 22:08:50 472.5 -0.025000000 4
#27 2015-12-22 22:09:50 530.0 -0.958333333 4
You may change threshold as appropriate to your data, I chose 2 hours based on the sample provided.
Upvotes: 1