Reputation: 1582
I've seen several cases of this error, but none of them seem to solve or apply to my situation.
I am building a logistic regression model with biglm
.
I have a data.frame with ~250 variables and a little over a million rows.
Since bigglm()
doesn't work with the dot notation to select all variables in the model I am building my formula like this.
So if f
is my formula and df
is my dataframe, then my model looks like this:
fit <- bigglm(f, data = df, family=binomial(link="logit"), chunksize=100, maxit=10)
And I get the error: variable lengths differ (found for 'x')
When I check for length of x
it is exactly the same as length of df
.
Other StackOverflow questions seem to suggest it might be a problem with the way the formula is constructed. Or perhaps it is a problem with biglm?
Upvotes: 2
Views: 15033
Reputation: 1582
I was able to solve this issue by making a slight modification in the way I was constructing my formula for bigglm()
As shown in the link attached in my question, I was constructing the formula like this:
n <- names(df)
f <- as.formula(paste("y ~", paste(n[!n %in% "y"], collapse = " + ")))
What f
was missing was the df$
before each variable name in the formula. Modifying the as.formula()
function to concatenate "df$"
to each variable name fixed this issue.
Upvotes: 0