Reputation: 1076
I am trying to create chunks of my dataset to run biglm
. (with fastLm I would need 350Gb of RAM)
My complete dataset is called res
. As experiment I drastically decreased the size to 10.000 rows. I want to create chunks to use with biglm.
library(biglm)
formula <- iris$Sepal.Length ~ iris$Sepal.Width
test <- iris[1:10,]
biglm(formula, test)
And somehow, I get the following output:
> test <- iris[1:10,]
> test
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
7 4.6 3.4 1.4 0.3 setosa
8 5.0 3.4 1.5 0.2 setosa
9 4.4 2.9 1.4 0.2 setosa
10 4.9 3.1 1.5 0.1 setosa
Above you can see the matrix test
contains 10 rows. Yet when running biglm
it shows a sample size of 150
> biglm(formula, test)
Large data regression model: biglm(formula, test)
Sample size = 150
Looks like it uses iris
instead of test
.. how is this possible and how do I get biglm to use chunk1 the way I intend it to?
Upvotes: 0
Views: 73
Reputation: 60964
I suspect the following line is to blame:
formula <- iris$Sepal.Length ~ iris$Sepal.Width
where in the formula you explicitly reference the iris
dataset. This will cause R to try and find the iris
dataset when lm
is called, which it finds in the global environment (because of R's scoping rules).
In a formula you normally do not use vectors, but simply the column names:
formula <- Sepal.Length ~ Sepal.Width
This will ensure that the formula contains only the column (or variable) names, which will be found in the data lm
is passed. So, lm
will use test
in stead of iris
.
Upvotes: 2