Ko Htut
Ko Htut

Reputation: 33

How to find average time points difference in longitudinal data

0

I have longitudinal data of body weights of over 100K participants. Time points of weight measurements between participants are not the same. What I want to know is the average time difference between 1st and 2nd measurement as well as 2nd and 3rd measurement etc. Another one is how many people or % of people who have 3 body weight measurements, as well as for 4,5, 6, 7, and 8 etc. How can I do to find these answers on R.

Upvotes: 2

Views: 154

Answers (1)

margusl
margusl

Reputation: 17434

Perhaps something like this:

library(dplyr, warn.conflicts = F)
set.seed(1)

# generate some sample data
dates <- seq(as.Date("2000-01-01"), by = "day", length.out = 500)
sample_data <- tibble(
  participant_id = sample(1:1000, size = 5000, replace = T), 
  meas_date = sample(dates, size = 5000, replace = T))  %>% 
  arrange(participant_id, meas_date)
sample_data
#> # A tibble: 5,000 × 2
#>    participant_id meas_date 
#>             <int> <date>    
#>  1              1 2000-01-18
#>  2              1 2000-02-28
#>  3              1 2000-05-15
#>  4              1 2001-02-01
#>  5              2 2000-05-11
#>  6              3 2000-01-22
#>  7              3 2000-03-27
#>  8              3 2000-04-17
#>  9              3 2000-09-23
#> 10              3 2000-12-13
#> # … with 4,990 more rows

# periods between each measurement for each participant
meas_periods <- sample_data %>% 
  group_by(participant_id) %>% 
  mutate(meas_n = row_number(),
         date_diff = meas_date - lag(meas_date)) %>%
  ungroup()
meas_periods
#> # A tibble: 5,000 × 4
#>    participant_id meas_date  meas_n date_diff
#>             <int> <date>      <int> <drtn>   
#>  1              1 2000-01-18      1  NA days 
#>  2              1 2000-02-28      2  41 days 
#>  3              1 2000-05-15      3  77 days 
#>  4              1 2001-02-01      4 262 days 
#>  5              2 2000-05-11      1  NA days 
#>  6              3 2000-01-22      1  NA days 
#>  7              3 2000-03-27      2  65 days 
#>  8              3 2000-04-17      3  21 days 
#>  9              3 2000-09-23      4 159 days 
#> 10              3 2000-12-13      5  81 days 
#> # … with 4,990 more rows

# average period between meas_n-1 and meas_n
meas_periods %>% 
  group_by(meas_n) %>% 
  summarise(mean_duration = mean(date_diff))
#> # A tibble: 13 × 2
#>    meas_n mean_duration
#>     <int> <drtn>       
#>  1      1       NA days
#>  2      2 88.54102 days
#>  3      3 86.16762 days
#>  4      4 76.21154 days
#>  5      5 69.11392 days
#>  6      6 67.16798 days
#>  7      7 50.67089 days
#>  8      8 50.91111 days
#>  9      9 49.89873 days
#> 10     10 48.70588 days
#> 11     11 51.00000 days
#> 12     12 26.25000 days
#> 13     13 66.00000 days

# number and percentage of participants gone through meas_n measurements
meas_periods %>% 
  count(meas_n, name = "participant_n") %>% 
  mutate(percent = participant_n/max(participant_n))
#> # A tibble: 13 × 3
#>    meas_n participant_n percent
#>     <int>         <int>   <dbl>
#>  1      1           996 1      
#>  2      2           963 0.967  
#>  3      3           877 0.881  
#>  4      4           728 0.731  
#>  5      5           553 0.555  
#>  6      6           381 0.383  
#>  7      7           237 0.238  
#>  8      8           135 0.136  
#>  9      9            79 0.0793 
#> 10     10            34 0.0341 
#> 11     11            12 0.0120 
#> 12     12             4 0.00402
#> 13     13             1 0.00100

Created on 2022-11-02 with reprex v2.0.2

Upvotes: 1

Related Questions