Statistix
Statistix

Reputation: 83

Collapsing daily longitudinal data into monthly observations by ID in R

I have longitudinal data with >100 rows per subject representing daily observations. I want to collapse columns, by subject ID, into monthly observations (i.e. have multiple rows per ID that are summarizing every 30 rows (days) of data).

How can you specify such groupings of days using dplyr?

Also of note, all subjects have different total number of days

Edit: data sample below

df<-structure(list(ID = structure(c(100087, 100087, 100087, 100087, 
                                    100087, 100087, 100087, 100087, 100087, 100087, 100087, 100087, 
                                    100087, 100087, 100087, 100087, 100087, 100087, 100087, 100087)), time = structure(c(0, 1, 2, 3, 
                                                                                                                         4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)), 
                   BMI = structure(c(20.06, 20.06, 20.06, 20.06, 20.06, 20.06, 
                                     20.06, 20.06, 20.06, 20.06, 20.06, 20.06, 20.06, 20.06, 20.06, 
                                     20.06, 20.06, 20.06, 20.06, 20.06)), Dis = structure(c(0, 
                                                                                            0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), 
                   Drug1 = structure(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
                                       1, 1, 1, 1, 1, 1, 1)), Drug2 = structure(c(1, 
                                                                                  1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1))), row.names = c(NA, 
                                                                                                                                                            -20L), class = c("tbl_df", "tbl", "data.frame"))

Upvotes: 1

Views: 115

Answers (1)

TimTeaFan
TimTeaFan

Reputation: 18561

I would group_by ID and a new times variable where you can specify the bins with time %/% 30 for 30 rows. Since your example data has only a few rows I set this to 5. Since each respondent has a different amount of times we need to record first_time and last_time to then overwrite times as x - y times where x and y are first and last time.

In the across call you would need to specify the way you want to aggregate the data, below I choose mean. In case you want to get the mean of BMI and the max value of Drug1 you would need to specify each column in a separate function call.

library(dplyr)

df %>% 
  group_by(ID, times = time %/% 5) %>% 
  summarise(across(BMI:Drug2, mean),
         time_first = first(time),
         time_last = last(time)
         ) %>% 
  ungroup() %>% 
  mutate(times = paste0(time_first, "-", time_last)) %>% 
  select(-c(time_first, time_last))

#> `summarise()` has grouped output by 'ID'. You can override using the `.groups`
#> argument.
#> # A tibble: 4 × 6
#>       ID times   BMI   Dis Drug1 Drug2
#>    <dbl> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 100087 0-4    20.1     0     1     1
#> 2 100087 5-9    20.1     0     1     1
#> 3 100087 10-14  20.1     0     1     1
#> 4 100087 15-19  20.1     0     1     1

# OPs data
df <- structure(list(ID = structure(c(100087, 100087, 100087, 100087, 100087, 100087, 100087, 100087, 100087, 100087, 100087, 100087, 100087, 100087, 100087, 100087, 100087, 100087, 100087, 100087)), time = structure(c(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)), BMI = structure(c(20.06, 20.06, 20.06, 20.06, 20.06, 20.06, 20.06, 20.06, 20.06, 20.06, 20.06, 20.06, 20.06, 20.06, 20.06, 20.06, 20.06, 20.06, 20.06, 20.06)), Dis = structure(c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), Drug1 = structure(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)), Drug2 = structure(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1))), row.names = c(NA, -20L), class = c("tbl_df", "tbl", "data.frame"))

Created on 2022-09-27 by the reprex package (v0.3.0)

Upvotes: 1

Related Questions