KintensT
KintensT

Reputation: 9

R:Sorting rows with time within multiple time interval

I want to pick up rows of which time data is between multiple intervals.

The data frame is like this:

dputs

structure(list(ID = c("A", "A", "A", "A", "A", "B", "B", "B", 
"B", "B"), score_time = c("2022/09/01 9:00:00", "2022/09/02 18:00:00", 
"2022/09/03 12:00:00", NA, NA, "2022/09/15 18:00:00", "2022/09/18 20:00:00", 
NA, NA, NA), score = c(243, 232, 319, NA, NA, 436, 310, NA, NA, 
NA), treatment_start = c(NA, NA, NA, "2022/09/02 8:00:00", "2022/09/03 11:00:00", 
NA, NA, "2022/09/15 8:00:00", "2022/09/16 14:00:00", "2022/09/16 23:00:00"
), treatment_end = c(NA, NA, NA, "2022/09/02 22:00:00", "2022/09/09 12:00:00", 
NA, NA, "2022/09/16 2:00:00", "2022/09/16 22:00:00", "2022/09/17 0:00:00"
)), row.names = c(NA, -10L), spec = structure(list(cols = list(
    ID = structure(list(), class = c("collector_character", "collector"
    )), score_time = structure(list(), class = c("collector_character", 
    "collector")), score = structure(list(), class = c("collector_double", 
    "collector")), treatment_start = structure(list(), class = c("collector_character", 
    "collector")), treatment_end = structure(list(), class = c("collector_character", 
    "collector"))), default = structure(list(), class = c("collector_guess", 
"collector")), delim = ","), class = "col_spec"), problems = <pointer: 0x6000000190b0>, class = c("spec_tbl_df", 
"tbl_df", "tbl", "data.frame"))  

   ID    score_time          score treatment_start     treatment_end      
   <chr> <chr>               <dbl> <chr>               <chr>              
 1 A     2022/09/01 9:00:00    243 NA                  NA                 
 2 A     2022/09/02 18:00:00   232 NA                  NA                 
 3 A     2022/09/03 12:00:00   319 NA                  NA                 
 4 A     NA                     NA 2022/09/02 8:00:00  2022/09/02 22:00:00
 5 A     NA                     NA 2022/09/03 11:00:00 2022/09/09 12:00:00
 6 B     2022/09/15 18:00:00   436 NA                  NA                 
 7 B     2022/09/18 20:00:00   310 NA                  NA                 
 8 B     NA                     NA 2022/09/15 8:00:00  2022/09/16 2:00:00 
 9 B     NA                     NA 2022/09/16 14:00:00 2022/09/16 22:00:00
10 B     NA                     NA 2022/09/16 23:00:00 2022/09/17 0:00:00 

Multiple score values are given for each ID with the measurement time.

And each ID has more than one information of treatment duration shown by start and end time.

My target is score values that are measured during treatment periods.

I tried with the package lubridate and tidyverse to mutate intervals but could not apply "%in%" method.

Here is my attempt until putting intervals in the same rows with score values.

data %>% 
  mutate(trt_interval = interval(start = treatment_start, end = treatment_end)) %>% 
  group_by(ID) %>% 
  mutate(num = row_number()) %>% 
  pivot_wider(names_from = num, names_prefix = "intvl", values_from = trt_interval) %>% 
  fill(c(intvl1:last_col()), .direction = "up")

Desired output is like this. (The first score of A and the last score of B dismissed because their score_time are out of interval.)

  ID    score
  <chr> <dbl>
1 A       232
2 A       319
3 B       436

I want to know the smarter way to put data in a row and how to apply "%in%" for multiple intervals.

Sorry that the question is not qualified and include multiple steps but any advices will be a great help for me.

Upvotes: 0

Views: 51

Answers (1)

Bertil Baron
Bertil Baron

Reputation: 5003

Hi I would first create two seperate data frames. One for the scores and one for the intervalls. Then would I join them both and filter the score that are within an treatment intervall.

data_score <- data %>% 
  filter(!is.na(score_time)) %>% 
  select(-starts_with("treat")) %>% 
  mutate(score_time = ymd_hms(score_time))
data_score
data_interval <- data %>% 
  filter(is.na(score_time)) %>% 
  select(ID,starts_with("treat")) %>%
  mutate(trt_interval = interval(start = treatment_start, end = treatment_end))
data_score %>% 
  inner_join(
    data_interval
  ) %>% 
  filter(
    lubridate::`%within%`(score_time,trt_interval )
  )

Hope this helps!!

Upvotes: 0

Related Questions