AEP
AEP

Reputation: 172

How to split output by timepoint in long format time-series data?

I would like to use gg_miss_var() from the naniar package to look at the amount of missing data at each timepoint in my data frame. The data frame includes time-series data in long format.

I have code that works with the df overall (see #1 in Code below). How can I extend this to produce the output split by Timepoint (see #2 in Code below for my attempted code)?

To clarify, what I would like to do is essentially repeat #1 in Code using data from each timepoint (per the Timepoint variable). Therefore, the amount of missing data for each variable would be presented for baseline data, year1 data, and year2 data, separately. Currently, #1 in Code looks at the missing data for all timepoints (i.e., baseline, year1, year2) combined.

I will be doing further analyses split by Timepoint with this data df, for example regressions, and so would ideally like code that is easily manipulated for these purposes.

Below is an example data frame (see Example Data). Note that the data frame I am working with is much larger (i.e., N = ~21,900)

Code

library(tidyverse)
library(naniar) # for gg_miss_var()

# 1. All missing data
gg_miss_var(df[,c("Score.1","Score.2","Score.3","Score.4")]) 

# 2. Missing data split by timepoint [does not work]
df %>% 
  group_by(Timepoint) %>% 
  gg_miss_var(.[,c("Score.1","Score.2","Score.3","Score.4")]) %>%
  ungroup()

Example Data


df <- structure(list(ID = c(1L, 1L, 1L, 2L, 
2L, 3L),  Timepoint = c("baseline", "year1", "year2", 
"baseline", "year1", "baseline"), Score.1 = c(NA, 6, 4, 4, 5, 5), Score.2 = c(11, 
10, 8, 8, 8, 9), Score.3 = c(4, NA, 9, 10, 8, 6), Score.4 = c(22, 
50, 33, 28, 27, 33)), row.names = c(NA, -6L), class = c("tbl_df", 
"tbl", "data.frame"))

Example Output

The below is the output produced for #1 in Code. What I would like is copies of the below, using only data from the corresponding timepoint (i.e., baseline, year1, year2).

enter image description here

Upvotes: 0

Views: 156

Answers (1)

Marek Fiołka
Marek Fiołka

Reputation: 4949

Do you expect it?

library(tidyverse)
library(naniar)
df %>% select(-ID) %>% 
  group_by(Timepoint) %>% 
  gg_miss_var(facet = Timepoint)

enter image description here

Unless you prefer three separate plots. Then do this:

df %>% select(-ID) %>% 
  group_by(Timepoint) %>% 
  group_map(~gg_miss_var(.x)+ggtitle(.y))

enter image description here enter image description here enter image description here

Is this the effect you were expecting?

Upvotes: 1

Related Questions