Benjamin Krick
Benjamin Krick

Reputation: 107

cleaning time series based on previous timepoints

In my clincal dataset, I have a unique identifors by patient ID and time, and then the variable of interest that look like so:

patientid <- c(100,100,100,101,101,101,102,102,102,104,104,104)
time <- c(1,2,3,1,2,3,1,2,3,1,2,3)
V1 <- c(1,1,NA,2,1,NA,1,3,NA,NA,1,NA)

Data <- data.frame(patientid=patientid, time=time, V1=V1)

Timepoint 3 is blank for each patient. I want to fill in timepoint three for each patient based on the following criteria. If at either time point 1 or 2 the variable is coded as a 2 or 3 then time point 3 should be coded as a 2. If at both time point 1 and 2, variable is coded as a 1 then time point point 3 should be coded as a one. If there is missing data at time point 1 or 2 then time point three should be missing. So for the toy expample it should look like this:

patientid <- c(100,100,100,101,101,101,102,102,102,104,104,104)
time <- c(1,2,3,1,2,3,1,2,3,1,2,3)
V1 <- c(1,1,1,2,1,2,1,3,2,NA,1,NA)

Data <- data.frame(patientid=patientid, time=time, V1=V1)

Upvotes: 0

Views: 41

Answers (2)

Davis Tucker Weaver
Davis Tucker Weaver

Reputation: 56

This should do it!

library(tidyverse)

patientid <- c(100,100,100,101,101,101,102,102,102,104,104,104)
time <- c(1,2,3,1,2,3,1,2,3,1,2,3)
V1 <- c(1,1,NA,2,1,NA,1,3,NA,NA,1,NA)

Data <- data.frame(patientid=patientid, time=time, V1=V1)

Data <- Data %>% pivot_wider(names_from = "time", values_from = "V1", 
                             names_prefix = "timepoint_")

timepoint_impute <- function(x,y) {
  if(is.na(x) | is.na(y)) {
    return(NA)
  } else if(2 %in% c(x,y) | 3 %in% c(x,y)) {
    return(2)
  } else if(x==1 & y==1) {
    return(1)
  }
}

Data$timepoint_3 <- map2(.x = Data$timepoint_1, .y = Data$timepoint_2,
                          .f = timepoint_impute)

You end up with wide data format but if you need long data format you can just use tidyr::pivot_longer. This approach writes a custom function to handle the logic you need.

Upvotes: 0

Mohan Govindasamy
Mohan Govindasamy

Reputation: 906

You can use pivot_wider from tidyr to convert your data to wide format and you can mutate the 3 column with your logic using a function with the help of map from purrr package. You can return back to the original shape of the data frame using pivot-longer

library(tidyverse)

patientid <- c(100,100,100,101,101,101,102,102,102,104,104,104)
time <- c(1,2,3,1,2,3,1,2,3,1,2,3)
V1 <- c(1,1,NA,2,1,NA,1,3,NA,NA,1,NA)

df <- data.frame(patientid=patientid, time=time, V1=V1)

flag <- function(t1,t2){
  if(is.na(t1)|is.na(t2)){
    NA
  } else if(t1 %in% c(2,3)|t2 %in% c(2,3)){
    2
  } else if(t1 == 1|t2 == 1){
    1
  }else {
    NA
  }
}

df %>% 
  as_tibble() %>% 
  pivot_wider(names_from = time, values_from = V1) %>% 
  mutate(`3` = pmap_dbl(list(`1`,`2`),flag )) %>% 
  pivot_longer(-1, names_to = "time", values_to = "V1")
#> # A tibble: 12 x 3
#>    patientid time     V1
#>        <dbl> <chr> <dbl>
#>  1       100 1         1
#>  2       100 2         1
#>  3       100 3         1
#>  4       101 1         2
#>  5       101 2         1
#>  6       101 3         2
#>  7       102 1         1
#>  8       102 2         3
#>  9       102 3         2
#> 10       104 1        NA
#> 11       104 2         1
#> 12       104 3        NA

Created on 2021-01-29 by the reprex package (v0.3.0)

Upvotes: 1

Related Questions