Reputation: 79
Good day,
This is a continuation question to this post
Here are some dummy data:
Date <- as.POSIXct(c('2018-03-20 11:52:25', '2018-03-22 12:01:44', '2018-03-20 12:05:25', '2018-03-20 12:10:40', '2018-03-20 12:12:51 ', '2018-03-21 2:01:23', '2018-03-21 2:45:01', '2018-03-21 3:30:00', '2018-03-21 3:45:00', '2018-03-21 5:00:00', '2018-03-21 5:45:00'))
Sites<-c(4, 4, 4, 6, 6, 7, 7, 4, 4, 6, 6)
Individual<-c("A", "A", "A", "A", "A", "A", "A", "A", "A", "A","A")
data<-data.frame(Individual, Date, Sites)
Individual Date Sites
A 2018-03-20 11:52:25 4
A 2018-03-22 12:01:44 4
A 2018-03-20 12:05:25 4
A 2018-03-20 12:10:40 6
A 2018-03-20 12:12:51 6
A 2018-03-21 02:01:23 7
A 2018-03-21 02:45:01 7
A 2018-03-21 03:30:00 4
A 2018-03-21 03:45:00 4
A 2018-03-21 05:00:00 6
A 2018-03-21 05:45:00 6
Basically, I would like R to tell me how much time is spent at each site. The data above, have repeating instances at sites and I would like R to tease out the repetitions and add the time differences for each.
I have tried the following code:
data.summary<-data %>%
group_by(Individual, Sites) %>%
summarise(time_spent = max(Date)-min(Date))
But this will take a time difference from just the minimum date at that site and the maximum date at the site, not accounting for instances of repetition, or times that the individual is at other sites.
Digging further into the dummy data, the summarize code says that individual A spent 2 days at site 4. However that individual left site 4 and reentered the site at a later date and should have a total time at site 4 of 28 minutes. How can I get R to reflect repetitive entries for that site?
Date1<-as.POSIXct("2018-03-20 11:52:25") # First instance at site 4
Date2<-as.POSIXct("2018-03-20 12:05:25") # Last time A spent at site 4 before leaving
difftime(Date2, Date1, units="mins")
# time diff = 13 minutes
# Second instance at site 4
Date3<-as.POSIXct("2018-03-21 03:30:00") # Second instance at site 4
Date4<-as.POSIXct("2018-03-21 03:45:00") # Last time A spent at site 4
difftime(Date4, Date3, units="mins")
# time diff= 15 mins
Thanks!
EDIT: I'm finding an issue with dplyr summarise, where extra time is being added. Here are dummy data:
Dates<-as.POSIXct(c("2018-04-09 16:59:03",
"2018-04-09 18:27:23",
"2018-04-09 17:01:20",
"2018-04-09 17:41:17"))
Individual<-c("A","A","A","A")
Site<-c("40","40","40", "40")
data<-data.frame(Dates, Individual, Site)
I want to summarize the time spent at site 40, with the minimum time stamp at this site subtracted from the maximum time stamp at the site
data %>%
group_by(Individual) %>%
arrange(Dates) %>%
group_by(Individual, Site) %>%
summarise(time_spent = max(Dates) - min(Dates))
# A tibble: 1 x 3
# Groups: Individual [?]
Individual Site time_spent
<fct> <fct> <time>
1 A 40 1.472222 hours
This says the total time spent at this site is 1.47 hours. However, when I manually get a time difference I get an entirely different value.
maxtime<-("2018-04-09 17:41:17")
mintime<-("2018-04-09 16:59:03")
difftime(maxtime, mintime, units="hours")
# Time difference of 0.7038889 hours
The actual time at site 40 is 0.70 hours. I'm not quite sure what summarise is referencing, or why extra time is being added.
EDIT 2: Okay, this looks like a units issue! Here is more reproducible data:
Dates<-as.POSIXct(c("2018-04-09 16:43:44","2018-03-20 11:52:25", "2018-04-09 16:59:03",
"2018-04-09 18:27:23",
"2018-04-09 17:01:20",
"2018-04-09 17:41:17"))
Individual<-c("A","A","A","A", "A","A")
Site<-c("38","38", "40","40","40", "40")
data<-data.frame(Dates, Individual, Site)
Dates Individual Site
1 2018-04-09 16:43:44 A 38
2 2018-03-20 11:52:25 A 38
3 2018-04-09 16:59:03 A 40
4 2018-04-09 18:27:23 A 40
5 2018-04-09 17:01:20 A 40
6 2018-04-09 17:41:17 A 40
data %>%
group_by(Individual) %>%
arrange(Dates) %>%
group_by(Individual, Site) %>%
summarise(time_spent = max(Dates) - min(Dates))
# A tibble: 2 x 3
# Groups: Individual [?]
Individual Site time_spent
<fct> <fct> <time>
1 A 38 20.202303 days
2 A 40 1.472222 days
Here, it says time spent at site 40 is 1.47 days, but this should be hours! According to manually finding time differences below:
maxtime<-("2018-04-09 18:27:23")
mintime<-("2018-04-09 16:59:03")
difftime(maxtime, mintime, units="hours")
# Time difference of 1.472222 hours
How can I correct this units issue? Instead of displaying hours intermixed with days, I would like R to calculate the time for all sites in days.
Upvotes: 2
Views: 305
Reputation: 5138
EDITED SOLUTION: after some trial and error this is what ended up working. This uses a function from data.table
so you'll need to have that installed.
Step 1: create a unique ID for all site observations (by site), ordered by Date
data %>%
arrange(Individuals, Dates) %>%
mutate(rle_id = data.table::rleid(Sites))
Dates Individuals Sites rle_id
1 2018-03-20 11:52:25 A 38 1
2 2018-04-09 16:43:44 A 38 1
3 2018-04-09 16:59:03 A 40 2
4 2018-04-09 17:01:20 A 40 2
5 2018-04-09 17:41:17 A 40 2
6 2018-04-09 18:27:23 A 40 2
7 2018-03-20 11:52:25 B 4 3
8 2018-03-20 12:05:25 B 4 3
9 2018-03-20 12:10:40 B 6 4
10 2018-03-20 12:12:51 B 6 4
11 2018-03-21 02:01:23 B 7 5
12 2018-03-21 02:45:01 B 7 5
13 2018-03-21 03:30:00 B 4 6
14 2018-03-21 03:45:00 B 4 6
15 2018-03-21 05:00:00 B 6 7
16 2018-03-21 05:45:00 B 6 7
17 2018-03-22 12:01:44 B 4 8
You could get the relid using something in base like what I have pasted below, but it is probably much slower (and harder to understand)
data <- data[order(data$Dates),]
rle_lengths <- rle(data$Sites)$lengths
unlist(Map(rep, 1:length(rle_lengths), rle_lengths))
[1] 1 2 2 3 3 4 4 5 5 6 6 7 8 9 9 9 9
vs.
data.table::rleid(data$Sites)
[1] 1 2 2 3 3 4 4 5 5 6 6 7 8 9 9 9 9
Step 2: get the time for individual A and B at each site. If we did not specify the units in difftime, it will do the calculation on individual units and display a common unit. E.g., 1.5 hours becomes 1.5 days if there is a someone there for several days.
data %>%
arrange(Individuals, Dates) %>%
mutate(rle_id = data.table::rleid(Sites)) %>%
group_by(Individuals, rle_id, Sites) %>%
summarise(time_spent = difftime(max(Dates), min(Dates), units = "days"))
# A tibble: 8 x 4
# Groups: Individuals, rle_id [8]
Individuals rle_id Sites time_spent
<fct> <int> <dbl> <time>
1 A 1 38 20.202303241 days
2 A 2 40 0.061342593 days
3 B 3 4 0.009027778 days
4 B 4 6 0.001516204 days
5 B 5 7 0.030300926 days
6 B 6 4 0.010416667 days
7 B 7 6 0.031250000 days
8 B 8 4 0.000000000 days
Step 3 (full solution): collapse across sites
data %>%
arrange(Individuals, Dates) %>%
mutate(rle_id = data.table::rleid(Sites)) %>%
group_by(Individuals, rle_id, Sites) %>%
summarise(time_spent = difftime(max(Dates), min(Dates), units = "days")) %>%
group_by(Individuals, Sites) %>%
summarise(time_spent_new = sum(time_spent))
# A tibble: 5 x 3
# Groups: Individuals [2]
Individuals Sites time_spent_new
<fct> <dbl> <time>
1 A 38 20.20230324 days
2 A 40 0.06134259 days
3 B 4 0.01944444 days
4 B 6 0.03276620 days
5 B 7 0.03030093 days
Data
Date <-as.POSIXct(c("2018-04-09 16:43:44","2018-03-20 11:52:25", "2018-04-09 16:59:03",
"2018-04-09 18:27:23","2018-04-09 17:01:20", "2018-04-09 17:41:17",
'2018-03-20 11:52:25', '2018-03-22 12:01:44', '2018-03-20 12:05:25',
'2018-03-20 12:10:40', '2018-03-20 12:12:51 ', '2018-03-21 2:01:23',
'2018-03-21 2:45:01', '2018-03-21 3:30:00', '2018-03-21 3:45:00',
'2018-03-21 5:00:00', '2018-03-21 5:45:00'))
Individual<-c(rep("A", 6), rep("B", 11))
Site<-c(38, 38, 40, 40, 40, 40, 4, 4, 4, 6, 6, 7, 7, 4, 4, 6, 6)
data<-data.frame(Dates = Date, Individuals = Individual, Sites = Site)
Upvotes: 1