Replacing NA depending on distribution type of gender for all variable at once in R

Question

Here, Replacing NA depending on distribution type of gender in R I asked how Replacing NA depending on distribution type. The Lstat's solution is great

library(dplyr)

data %>% 
 group_by(sex) %>%
 mutate(
  emotion = ifelse(!is.na(emotion), emotion,
   ifelse(shapiro.test(emotion)$p.value > 0.05,
    mean(emotion, na.rm=TRUE), quantile(emotion, na.rm=TRUE, probs=0.5) ) ),
  IQ = ifelse(!is.na(IQ), IQ,
   ifelse(shapiro.test(IQ)$p.value > 0.05,
    mean(IQ, na.rm=TRUE), quantile(IQ, na.rm=TRUE, probs=0.5) )
  )
 )

But what if I have 20 vars or more. How to do that this code works for all variables at once. i.e. I don't want to write each string

var1=ifelse
var2=ifelse
...
var20 ifelse

Here's the data

data=structure(list(sex = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), emotion = c(20L, 
15L, 49L, NA, 34L, 35L, 54L, 45L), IQ = c(101L, 98L, 105L, NA, 
123L, 120L, 115L, NA)), .Names = c("sex", "emotion", "IQ"), class = "data.frame", row.names = c(NA, 
-8L))

MKR · Accepted Answer

You can consider using dplyr::mutate_at to apply same function on multiple columns.

Suppose, you want to apply same function on both emotion and IQ columns then solution can be written as:

library(dplyr)
data %>% 
  group_by(sex) %>%
  mutate_at(vars(c("emotion", "IQ")), 
            funs(ifelse(!is.na(.), ., ifelse(shapiro.test(.)$p.value > 0.05,
                             mean(., na.rm=TRUE), quantile(., na.rm=TRUE, probs=0.5)))))

# # A tibble: 8 x 3
# # Groups: sex [2]
#     sex emotion    IQ
#       
# 1     1    20.0 101  
# 2     1    15.0  98.0
# 3     1    49.0 105  
# 4     1    28.0 101  
# 5     2    34.0 123  
# 6     2    35.0 120  
# 7     2    54.0 115  
# 8     2    45.0 119

Replacing NA depending on distribution type of gender for all variable at once in R

Answers (1)

Related Questions