m_zhang
m_zhang

Reputation: 1

calculate time difference in the same group

I want to convert the column time to be in time decimal format and then find the time interval within each group of the user_id. I have tried the answer below, but I could not get it to work:

Days difference between two dates in same column in R

structure(list(question_id = c(5502L, 5502L, 5502L, 5502L, 5502L
), user_id = c(112197L, 112197L, 112197L, 114033L, 114033L), 
    time = structure(c(1603720173, 1603720388, 1603720702, 1603603115, 
    1603949442), class = c("POSIXct", "POSIXt"), tzone = ""), 
    prediction = c(0.9, 0.95, 0.9, 0.99, 0.94), log_score = c(0.84799690655495, 
    0.925999418556223, 0.84799690655495, 0.985500430304885, 0.910732661902913
    )), row.names = 156182:156186, class = "data.frame")

Upvotes: 0

Views: 600

Answers (2)

crestor
crestor

Reputation: 1466

library(tidyverse)
library(lubridate)

df <- tibble::tribble(
  ~question_id, ~user_id, ~time, ~prediction, ~log_score,
  5502L,  112197L, "2020-10-26 14:49:33",         0.9,  0.84799690655495,
  5502L,  112197L, "2020-10-26 14:53:08",        0.95, 0.925999418556223,
  5502L,  112197L, "2020-10-26 14:58:22",         0.9,  0.84799690655495,
  5502L,  114033L, "2020-10-25 06:18:35",        0.99, 0.985500430304885,
  5502L,  114033L, "2020-10-29 06:30:42",        0.94, 0.910732661902913
)

df %>%
  as_tibble() %>%
  mutate(time = lubridate::ymd_hms(time)) %>%
  group_by(user_id) %>%
  mutate(diff = time - lag(time),
         diff2 = hms::hms(seconds_to_period(diff)))
#> # A tibble: 5 x 7
#> # Groups:   user_id [2]
#>   question_id user_id time                prediction log_score diff        diff2   
#>         <int>   <int> <dttm>                   <dbl>     <dbl> <drtn>      <time>  
#> 1        5502  112197 2020-10-26 14:49:33       0.9      0.848     NA secs       NA
#> 2        5502  112197 2020-10-26 14:53:08       0.95     0.926    215 secs 00:03:35
#> 3        5502  112197 2020-10-26 14:58:22       0.9      0.848    314 secs 00:05:14
#> 4        5502  114033 2020-10-25 06:18:35       0.99     0.986     NA secs       NA
#> 5        5502  114033 2020-10-29 06:30:42       0.94     0.911 346327 secs 96:12:07

Upvotes: 1

Ian Campbell
Ian Campbell

Reputation: 24770

Perhaps this is what you're looking for?

library(dplyr)
user_data %>%
   group_by(user_id) %>%
   summarise(day.interval = difftime(max(time), min(time),units = "days"))
# A tibble: 2 x 2
  user_id day.interval    
    <int> <drtn>          
1  112197 0.006122685 days
2  114033 4.008414352 days

Upvotes: 1

Related Questions