Reputation: 23
I'm having problems to run a robust linear regression model (using rlm from the MASS library) over a list of dataframes.
Reproducible example:
var1 <- c(1:100)
var2 <- var1*var1
df1 <- data.frame(var1, var2)
var1 <- var1 + 50
var2 <- var2*2
df2 <- data.frame(var1, var2)
lst1 <- list(df1, df2)
Linear model (works):
lin_mod <- lapply(lst1, lm, formula = var1 ~ var2)
summary(lin_mod[[1]])
My code for the robust model:
rob_mod <- lapply(lst1, MASS::rlm, formula = var1 ~ var2)
gives the following error:
Error in rlm.default(X[[i]], ...) :
argument "y" is missing, with no default
How could I solve this?
The error in my actual data is:
Error in qr.default(x) : NA/NaN/Inf in foreign function call (arg 1)
In addition: Warning message:
In storage.mode(x) <- "double" : NAs introduced by coercion
Upvotes: 2
Views: 326
Reputation: 17648
You can also try a purrr:map
solution:
library(tidyverse)
map(lst1, ~rlm(var1 ~ var2, data=.))
or as joran commented
map(lst1, MASS:::rlm.formula, formula = var1 ~ var2)
As you can see here ?lm
provides only a formula method. In contrast ?rlm
provides both (formula
and x, y
). Thus, you have to specify data=
to say rlm
to explicitly use the formula method. Otherwise rlm
wants x
and y
as input.
Upvotes: 3
Reputation: 76641
Your call is missing the data
argument. lapply
will call FUN
with each member of the list as the first argument of FUN
but data
is the second argument to rlm
.
The solution is to define an anonymous function.
lin_mod <- lapply(lst1, function(DF) MASS::rlm(formula = var1 ~ var2, data = DF))
summary(lin_mod[[1]])
#
#Call: rlm(formula = var1 ~ var2, data = DF)
#Residuals:
# Min 1Q Median 3Q Max
#-18.707 -5.381 1.768 6.067 7.511
#
#Coefficients:
# Value Std. Error t value
#(Intercept) 19.6977 1.0872 18.1179
#var2 0.0092 0.0002 38.2665
#
#Residual standard error: 8.827 on 98 degrees of freedom
Upvotes: 2