Reputation: 1
I have a data frame with 80 subjects, each of which is supposed to have 50 observations. However, due to some exclusion criteria applied previously in data processing, now, NOT every subject has 50 observations. But to apply some subsequent data analysis procedure, I need the data frame to be 80 * 50. So, I need to add those missing rows back and assign them values of 0. I wonder how I could achieve this.
I am using a simplified situation to illustrate the point. Suppose the data frame has three columns: Subj, TimeBin, and Value. Suppose there are 3 Subjs: S001, S002, S003; and there are 6 TimeBins: T0, T1, T2, T3, T4, T5. Now, S001 and S002 have all necessary observations, but S003 is missing observations at T2 and T5. How should I make up those two missing rows?
Thank you!
Upvotes: 0
Views: 514
Reputation: 33782
Let's try to recreate the situation you describe.
Here's a data frame where the Value
for (S003, T2) and (S003, T5) is NA:
library(dplyr)
library(tidyr)
set.seed(1001)
df1 <- data.frame(Subj = rep(c("S001", "S002", "S003"), each = 6),
TimeBin = rep(c("T0", "T1", "T2", "T3", "T4", "T5"), 3),
Value = c(sample(1:50, 18, replace = TRUE))) %>%
mutate(Value = ifelse(Subj == "S003" & grepl("T[2|5]", TimeBin), NA, Value))
"some exclusion criteria applied previously in data processing" - you don't specify what that is, but let's just omit rows with NA values:
df1 <- na.omit(df1)
tidyr::complete()
can handle this, provided that at least some of the subjects have a complete set of rows:
df1 %>%
complete(Subj, nesting(TimeBin), fill = list(Value = 0))
If no subjects have a complete set, you will have to devise some sort of join between the processed and original data.
Upvotes: 1