Irene
Irene

Reputation: 13

Doesn’t work step function after glm only in dplyr procedure

I'm trying to run a stepwise regression using dplyr, but it results in the following error:

Error in as.data.frame.default(data) : cannot coerce class ‘c("glm", "lm")’ to a data.frame

glm works well, but when the code tries to save the result of a step to a dataframe the error occurs.

I checked the class of function glm and function step are the same as "c(glm, lm)". But only step funcion doesn't work.

I tried several ways to fix this error, like do statement, map2 (passing data a data parameter) but nothing works.

more detail.. when I run this code :

...
  group_by(ITEM_CODE) %>%
  nest() %>%
  mutate(model = map(data, ~ glm(formula_full,family=gaussian(),na.action=na.omit,data=.x))
         ) %>%
  ungroup()

results like as follow.. here, glm returns c("glm", "lm")

> M_CODE     data       model 
> 0034019   <tibble>    <S3: glm>       
> 0040726   <tibble>    <S3: glm>           
> 0057446   <tibble>    <S3: glm>

I'm trying to add 'step' results at 4th column of this (next model column).

But when I try to run the next code (add stepm variable)

2nd code :

  group_by(ITEM_CODE) %>%
  nest() %>%
  mutate(model = map(data, ~ glm(formula_full,family=gaussian(),na.action=na.omit,data=.x))
        ,stepm = map(model, ~ step(.x, direction = "both", trace = 0)) # <-- Error point!
         ) %>%
  ungroup()

then error occur which I mentioned at first.

Actually, class(model) = class(stepm) = c("glm", "lm") but only stepm doesn't accepted and dropped error..

So, I'm very confused.. Does anybody know about this problem..?

Thank u in advance

Upvotes: 1

Views: 1198

Answers (2)

Tokaalmighty
Tokaalmighty

Reputation: 402

I encountered this issue today, the reason behind it was that I gave the to-be-fitted model the same name as an existing dataframe, so the class of this variable was mixed. The solution is really simple, just change the name of the model to avoid naming overlaps, you'll be good to go.

Upvotes: 1

pgcudahy
pgcudahy

Reputation: 1601

I'm also confused by the cause of this error, but I got a clue from here and tried wrapping glm in do.call

library(tidyverse)
set.seed(101)
model_df <- tibble(label=c("a", "b", "c"),
model_data = list(tibble(y=rbinom(100,size=1,prob=0.5),
                x1=rnorm(100),
                x2=rnorm(100),
                x3=rnorm(100),
                x4=rnorm(100)),
         tibble(y=rbinom(100,size=1,prob=0.5),
                x1=rnorm(100),
                x2=rnorm(100),
                x3=rnorm(100),
                x4=rnorm(100)),
         tibble(y=rbinom(100,size=1,prob=0.5),
                x1=rnorm(100),
                x2=rnorm(100),
                x3=rnorm(100),
                x4=rnorm(100))))
model_df <- model_df %>%
    mutate(model = map(model_data, ~ do.call("glm", list(y ~ x1 + x2 + x3 + x4,
                       family = gaussian(),
                       na.action=na.omit,
                       data = .x)))) %>%
    mutate(stepm = map(model, ~ step(.x, direction = "both", scope=list(lower=.~1, upper=formula(.x)),
                                     trace = 0)))
model_df$stepm[[1]]
#> 
#> Call:  glm(formula = y ~ 1, family = structure(list(family = "gaussian", 
#>     link = "identity", linkfun = function (mu) 
...
#> 
#> Coefficients:
#> (Intercept)  
#>        0.54  
#> 
#> Degrees of Freedom: 99 Total (i.e. Null);  99 Residual
#> Null Deviance:       24.84 
#> Residual Deviance: 24.84     AIC: 148.5

Upvotes: 1

Related Questions