how to ignore groups with all NAs while imputing data

Question

I have a large panel data with 1000s of rows. I want to use group by (gvkey) and impute values for NAs but some groups have all NAs. I want to ignore those groups.

These lines of code give me what I seek

set.seed(123)  
fake_data <- data.frame(
  gvkey = rep(c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J"), each = num_years),
  year = rep(2010:2014, 10),
  dltt = rnorm(50))

for (gvkey in c("A", "B", "D", "E", "F", "G", "H", "I", "J")) {
  year_to_replace <- sample(c(2011, 2012, 2013), size = sample(2:3, 1), replace = FALSE)
  fake_data$dltt[fake_data$gvkey == gvkey & fake_data$year %in% year_to_replace] <- NA
}

fake_data <- fake_data %>%
  arrange(gvkey, year) %>%
  group_by(gvkey) %>%
  mutate(dltt_imputed = na.approx(dltt))

But I get an error if some group has all NAs

fake_data$dltt[fake_data$gvkey == "C"] <- NA

fake_data <- fake_data %>%
  arrange(gvkey, year) %>%
  group_by(gvkey) %>%
  mutate(dltt_imputed = na.approx(dltt))

Please would someone help me add some conditions to the ongoing pipe to ignore such groups

how to ignore groups with all NAs while imputing data

Answers (1)

Related Questions