Reputation: 1979
I am working on a time/date variable and trying to estimate the time spent for each record. I am following two steps for this analysis.
(a)
modify the variables in the desired format
(b)
calculate the time spent on each question.
Here how my dataset looks like:
id <- c(1,1,1,1,1, 2,2,2,2,2)
item.id <- c(1,2,3,4,5, 1,2,3,4,5)
submit.time <-c("2019-04-09 09:50:30.340","2019-04-09 09:52:12.440","2019-04-09 09:52:15.787","2019-04-09 09:53:21.587","2019-04-09 09:53:49.047",
"2019-04-09 09:49:45.243","2019-04-09 09:52:53.663","2019-04-09 09:53:23.293","2019-04-09 09:54:00.727","2019-04-09 09:54:52.400")
start.time <- c("04/09/2019 09:50:02.317 AM","04/09/2019 09:50:02.317 AM","04/09/2019 09:50:02.317 AM","04/09/2019 09:50:02.317 AM","04/09/2019 09:50:02.317 AM",
"04/09/2019 09:47:42.583 AM","04/09/2019 09:47:42.583 AM","04/09/2019 09:47:42.583 AM","04/09/2019 09:47:42.583 AM","04/09/2019 09:47:42.583 AM")
data <- data.frame(id, item.id,start.time, submit.time)
> data
id item.id start.time submit.time
1 1 1 04/09/2019 09:50:02.317 AM 2019-04-09 09:50:30.340
2 1 2 04/09/2019 09:50:02.317 AM 2019-04-09 09:52:12.440
3 1 3 04/09/2019 09:50:02.317 AM 2019-04-09 09:52:15.787
4 1 4 04/09/2019 09:50:02.317 AM 2019-04-09 09:53:21.587
5 1 5 04/09/2019 09:50:02.317 AM 2019-04-09 09:53:49.047
6 2 1 04/09/2019 09:47:42.583 AM 2019-04-09 09:49:45.243
7 2 2 04/09/2019 09:47:42.583 AM 2019-04-09 09:52:53.663
8 2 3 04/09/2019 09:47:42.583 AM 2019-04-09 09:53:23.293
9 2 4 04/09/2019 09:47:42.583 AM 2019-04-09 09:54:00.727
10 2 5 04/09/2019 09:47:42.583 AM 2019-04-09 09:54:52.400
id
for each student, item.id
is the id of the question, start.time
is the for login time to the exam(unique time for each student), and submit.time
is the time when a student submits the answer for each question.
(a)
editing data: this step includes removing AM|PM
and switching the order of the start.time
because I would like to use the submit.time
's format, and edited start.time
's format.
data$start.time <- gsub(" AM| PM", "", data$start.time) # exclude AM or PM
data$start.time <- gsub("/", "-", data$start.time) #replace / with -
dtparts = t(as.data.frame(strsplit(data$start.time,' '))) # split date and time
row.names(dtparts) = NULL
data$newdate <- strptime(as.character(dtparts[,1]), "%m-%d-%Y") # switch the date order
data$newdate <- as.POSIXct(data$newdate) # R was complaining about the time format-had to change here
data$start.time <- paste0(data$newdate," ",dtparts[,2]) # bring the time back
Now, the two-timing variables look the same. I transformed these dates and times to seconds.
data %>%
mutate(start.time.num = as.numeric(as.POSIXct(start.time), units="secs")) %>%
mutate(submit.time.num = as.numeric(as.POSIXct(submit.time), units="secs"))
id item.id start.time submit.time newdate start.time.num submit.time.num
1 1 1 2019-04-09 09:50:02.317 2019-04-09 09:50:30.340 2019-04-09 1554817802 1554817830
2 1 2 2019-04-09 09:50:02.317 2019-04-09 09:52:12.440 2019-04-09 1554817802 1554817932
3 1 3 2019-04-09 09:50:02.317 2019-04-09 09:52:15.787 2019-04-09 1554817802 1554817936
4 1 4 2019-04-09 09:50:02.317 2019-04-09 09:53:21.587 2019-04-09 1554817802 1554818002
5 1 5 2019-04-09 09:50:02.317 2019-04-09 09:53:49.047 2019-04-09 1554817802 1554818029
6 2 1 2019-04-09 09:47:42.583 2019-04-09 09:49:45.243 2019-04-09 1554817663 1554817785
7 2 2 2019-04-09 09:47:42.583 2019-04-09 09:52:53.663 2019-04-09 1554817663 1554817974
8 2 3 2019-04-09 09:47:42.583 2019-04-09 09:53:23.293 2019-04-09 1554817663 1554818003
9 2 4 2019-04-09 09:47:42.583 2019-04-09 09:54:00.727 2019-04-09 1554817663 1554818041
10 2 5 2019-04-09 09:47:42.583 2019-04-09 09:54:52.400 2019-04-09 1554817663 1554818092
(b)
in this step, I would like to calculate the time spent on each question here. For the first question of the first student, the time spent should be submit.time.num(1554817830) - start.time.num(1554817802)=28
. For the second question of the first student, the time spent should be submit.time.num(1554817932)
-
previous submit.time.num(1554817830)=102
. This procedure needs to be repeated for each student. When it gets to the second student, it should take the start.time
again for the first row of the second student.
So, the additional column should look like this:
> time.spent
time.spent
1 28
2 102
3 4
4 66
5 27
6 122
7 189
8 29
9 38
10 51
I apologize for posting this long, if you also have any suggestions for the first part, please let me know, more importantly, any suggestions for the part (b)
?
Thanks
Upvotes: 1
Views: 105
Reputation: 160447
This can be done a bit faster in a single pipeline.
library(dplyr)
data %>%
mutate(
start.time = as.POSIXct(start.time, format = "%m/%d/%Y %H:%M:%OS"),
submit.time = as.POSIXct(submit.time),
time.spent = difftime(submit.time, start.time, units = "secs")
) %>%
group_by(id) %>%
mutate(
time.spent = c(time.spent[1], diff(time.spent))
) %>%
ungroup()
# # A tibble: 10 x 5
# id item.id start.time submit.time time.spent
# <dbl> <dbl> <dttm> <dttm> <drtn>
# 1 1 1 2019-04-09 09:50:02 2019-04-09 09:50:30 28.023 secs
# 2 1 2 2019-04-09 09:50:02 2019-04-09 09:52:12 102.100 secs
# 3 1 3 2019-04-09 09:50:02 2019-04-09 09:52:15 3.347 secs
# 4 1 4 2019-04-09 09:50:02 2019-04-09 09:53:21 65.800 secs
# 5 1 5 2019-04-09 09:50:02 2019-04-09 09:53:49 27.460 secs
# 6 2 1 2019-04-09 09:47:42 2019-04-09 09:49:45 122.660 secs
# 7 2 2 2019-04-09 09:47:42 2019-04-09 09:52:53 188.420 secs
# 8 2 3 2019-04-09 09:47:42 2019-04-09 09:53:23 29.630 secs
# 9 2 4 2019-04-09 09:47:42 2019-04-09 09:54:00 37.434 secs
# 10 2 5 2019-04-09 09:47:42 2019-04-09 09:54:52 51.673 secs
Using @akrun's suggestion, we can shorten the code a little:
data %>%
group_by(id) %>%
mutate(
start.time = as.POSIXct(start.time, format = "%m/%d/%Y %H:%M:%OS"),
submit.time = as.POSIXct(submit.time),
time.spent = submit.time - lag(submit.time, default = first(start.time))
) %>%
ungroup()
(and optionally remove the columns you no longer need).
Upvotes: 2