jgozal
jgozal

Reputation: 1582

variable lengths differ (found for 'x')

I've seen several cases of this error, but none of them seem to solve or apply to my situation.

I am building a logistic regression model with biglm.

I have a data.frame with ~250 variables and a little over a million rows.

Since bigglm() doesn't work with the dot notation to select all variables in the model I am building my formula like this.

So if f is my formula and df is my dataframe, then my model looks like this:

fit <- bigglm(f, data = df, family=binomial(link="logit"), chunksize=100, maxit=10)

And I get the error: variable lengths differ (found for 'x')

When I check for length of x it is exactly the same as length of df.

Other StackOverflow questions seem to suggest it might be a problem with the way the formula is constructed. Or perhaps it is a problem with biglm?

Upvotes: 2

Views: 15033

Answers (1)

jgozal
jgozal

Reputation: 1582

I was able to solve this issue by making a slight modification in the way I was constructing my formula for bigglm()

As shown in the link attached in my question, I was constructing the formula like this:

n <- names(df)
f <- as.formula(paste("y ~", paste(n[!n %in% "y"], collapse = " + ")))

What f was missing was the df$ before each variable name in the formula. Modifying the as.formula() function to concatenate "df$"to each variable name fixed this issue.

Upvotes: 0

Related Questions