ℕʘʘḆḽḘ
ℕʘʘḆḽḘ

Reputation: 19395

select non-missing variables in a purrr loop

Consider this example

mydata <- data_frame(ind_1 = c(NA,NA,3,4),
                     ind_2 = c(2,3,4,5),
                     ind_3 = c(5,6,NA,NA),
                     y = c(28,34,25,12),
                     group = c('a','a','b','b'))

> mydata
# A tibble: 4 x 5
  ind_1 ind_2 ind_3     y group
  <dbl> <dbl> <dbl> <dbl> <chr>
1    NA     2     5    28 a    
2    NA     3     6    34 a    
3     3     4    NA    25 b    
4     4     5    NA    12 b 

Here I want, for each group, regress y on whatever variable is not missing in that group, and store the corresponding lm object in a list-column.

That is:

I tried the following but this does not work

mydata %>% group_by(group) %>% nest() %>% 
  do(filtered_df <- . %>% select(which(colMeans(is.na(.)) == 0)),
     myreg = lm(y~ names(filtered_df)))

Any ideas? Thanks!

Upvotes: 6

Views: 246

Answers (2)

moodymudskipper
moodymudskipper

Reputation: 47330

Here's another tidyverse option, assign to mydata$model if you wish to keep it in your tibble :

library(tidyverse)
mydata %>%
  nest(-group) %>%
  pull(data) %>%
  map(~lm(y ~., discard(.,anyNA)))
# [[1]]
# 
# Call:
# lm(formula = y ~ ., data = discard(., anyNA))
# 
# Coefficients:
# (Intercept)        ind_2        ind_3  
#          16            6           NA  
# 
# 
# [[2]]
# 
# Call:
# lm(formula = y ~ ., data = discard(., anyNA))
# 
# Coefficients:
# (Intercept)        ind_1        ind_2  
#          64          -13           NA  
# 
# 

Upvotes: 2

acylam
acylam

Reputation: 18691

We can use map and mutate. We can either select and model in one step (nestdat1) or in separate steps using two map's if you want to preserve the filtered data (nestdat2):

library(tidyverse)

nestdat1 <- mydata %>%
  group_by(group) %>%
  nest() %>%
  mutate(model = data %>% map(~ select_if(., funs(!any(is.na(.)))) %>%
                                lm(y ~ ., data = .)))

nestdat2 <- mydata %>%
  group_by(group) %>%
  nest() %>%
  mutate(data = data %>% map(~ select_if(., funs(!any(is.na(.))))),
         model = data %>% map(~ lm(y ~ ., data = .)))

Output:

They produce different data columns:

> nestdat1 %>% pull(data)
[[1]]
# A tibble: 2 x 4
  ind_1 ind_2 ind_3     y
  <dbl> <dbl> <dbl> <dbl>
1    NA     2     5    28
2    NA     3     6    34

[[2]]
# A tibble: 2 x 4
  ind_1 ind_2 ind_3     y
  <dbl> <dbl> <dbl> <dbl>
1     3     4    NA    25
2     4     5    NA    12

> nestdat2 %>% pull(data)
[[1]]
# A tibble: 2 x 3
  ind_2 ind_3     y
  <dbl> <dbl> <dbl>
1     2     5    28
2     3     6    34

[[2]]
# A tibble: 2 x 3
  ind_1 ind_2     y
  <dbl> <dbl> <dbl>
1     3     4    25
2     4     5    12

But the same model column:

> nestdat1 %>% pull(model)
[[1]]

Call:
lm(formula = y ~ ., data = .)

Coefficients:
(Intercept)        ind_2        ind_3  
         16            6           NA  

[[2]]

Call:
lm(formula = y ~ ., data = .)

Coefficients:
(Intercept)        ind_1        ind_2  
         64          -13           NA  


> nestdat2 %>% pull(model)
[[1]]

Call:
lm(formula = y ~ ., data = .)

Coefficients:
(Intercept)        ind_2        ind_3  
         16            6           NA  

[[2]]

Call:
lm(formula = y ~ ., data = .)

Coefficients:
(Intercept)        ind_1        ind_2  
         64          -13           NA 

Upvotes: 8

Related Questions