Reputation: 13
I'm trying to run a stepwise regression using dplyr, but it results in the following error:
Error in as.data.frame.default(data) : cannot coerce class ‘c("glm", "lm")’ to a data.frame
glm works well, but when the code tries to save the result of a step to a dataframe the error occurs.
I checked the class of function glm and function step are the same as "c(glm, lm)". But only step funcion doesn't work.
I tried several ways to fix this error, like do statement, map2 (passing data a data parameter) but nothing works.
more detail.. when I run this code :
...
group_by(ITEM_CODE) %>%
nest() %>%
mutate(model = map(data, ~ glm(formula_full,family=gaussian(),na.action=na.omit,data=.x))
) %>%
ungroup()
results like as follow.. here, glm returns c("glm", "lm")
> M_CODE data model
> 0034019 <tibble> <S3: glm>
> 0040726 <tibble> <S3: glm>
> 0057446 <tibble> <S3: glm>
I'm trying to add 'step' results at 4th column of this (next model column).
But when I try to run the next code (add stepm variable)
2nd code :
group_by(ITEM_CODE) %>%
nest() %>%
mutate(model = map(data, ~ glm(formula_full,family=gaussian(),na.action=na.omit,data=.x))
,stepm = map(model, ~ step(.x, direction = "both", trace = 0)) # <-- Error point!
) %>%
ungroup()
then error occur which I mentioned at first.
Actually, class(model) = class(stepm) = c("glm", "lm") but only stepm doesn't accepted and dropped error..
So, I'm very confused.. Does anybody know about this problem..?
Thank u in advance
Upvotes: 1
Views: 1198
Reputation: 402
I encountered this issue today, the reason behind it was that I gave the to-be-fitted model the same name as an existing dataframe, so the class of this variable was mixed. The solution is really simple, just change the name of the model to avoid naming overlaps, you'll be good to go.
Upvotes: 1
Reputation: 1601
I'm also confused by the cause of this error, but I got a clue from here and tried wrapping glm
in do.call
library(tidyverse)
set.seed(101)
model_df <- tibble(label=c("a", "b", "c"),
model_data = list(tibble(y=rbinom(100,size=1,prob=0.5),
x1=rnorm(100),
x2=rnorm(100),
x3=rnorm(100),
x4=rnorm(100)),
tibble(y=rbinom(100,size=1,prob=0.5),
x1=rnorm(100),
x2=rnorm(100),
x3=rnorm(100),
x4=rnorm(100)),
tibble(y=rbinom(100,size=1,prob=0.5),
x1=rnorm(100),
x2=rnorm(100),
x3=rnorm(100),
x4=rnorm(100))))
model_df <- model_df %>%
mutate(model = map(model_data, ~ do.call("glm", list(y ~ x1 + x2 + x3 + x4,
family = gaussian(),
na.action=na.omit,
data = .x)))) %>%
mutate(stepm = map(model, ~ step(.x, direction = "both", scope=list(lower=.~1, upper=formula(.x)),
trace = 0)))
model_df$stepm[[1]]
#>
#> Call: glm(formula = y ~ 1, family = structure(list(family = "gaussian",
#> link = "identity", linkfun = function (mu)
...
#>
#> Coefficients:
#> (Intercept)
#> 0.54
#>
#> Degrees of Freedom: 99 Total (i.e. Null); 99 Residual
#> Null Deviance: 24.84
#> Residual Deviance: 24.84 AIC: 148.5
Upvotes: 1