Reputation: 497
I have a data frame that I split based on a vector of factors. I'm trying to create a model for each data set and then create a set of predicted values from them.
I'm trying to span the predicted values over a large number of values (e.g. length.out = 500
), but when I feed the predict
function with a new data set with 500 rows, it still spits out a predicted data frame that's the same length as the original data frame fed into the model.
data(mtcars)
rownames(mtcars) <- NULL #I've ran this code with and without this line, both times it gave the same result
mtcars.split <- split(mtcars, mtcars$cyl)
mtcars.split <- lapply(mtcars.split, function(x){
rownames(x) <- NULL
x <- droplevels(x)
return(x)
})
mtcars.lm <- lapply(mtcars.split, function(x){
lm(disp ~ wt, data = x)
})
mtcars.fitted <- mapply(x = mtcars.lm, y = mtcars.split, function(x, y){
newdata = data.frame(wt = seq(min(y$wt), max(y$wt), length.out = 500))
fitted <- as.data.frame(predict(x, new.data = newdata, se = T))
return(fitted)
}, SIMPLIFY = F)
lapply(mtcars.fitted, nrow)
lapply(mtcars.split, nrow)
I tried running the linear model for the entire data set and it did the same thing.
mtcars.lm.all <- lm(disp ~ wt, data = mtcars)
newdata <- data.frame(wt = seq(min(mtcars$wt), max(mtcars$wt), length.out = 500))
nrow(as.data.frame(predict(mtcars.lm.all, new.data = newdata, se = T)))
Even attempting to subset the data set didn't make any difference.
mtcars.head <- head(mtcars, n = 16)
mtcars.head.lm <- lm(disp ~ wt, data = mtcars.head)
predict.mtcars <- as.data.frame(predict(mtcars.head.lm,
new.data = data.frame(wt = seq(min(mtcars.head$wt),
max(mtcars.head$wt),
length.out = 500)),
se = T))
nrow(predict.mtcars)
Am I missing something here? This used to work but it doesn't seem to work now. Even restarting the R session or my computer doesn't seem to make it work.
Upvotes: 0
Views: 1232
Reputation: 408
the argument in the predict
function is not new.data
but newdata
.
Attached the desired result.
data(mtcars)
rownames(mtcars) <- NULL #I've ran this code with and without this line, both times it gave the same result
mtcars.split <- split(mtcars, mtcars$cyl)
mtcars.split <- lapply(mtcars.split, function(x){
rownames(x) <- NULL
x <- droplevels(x)
return(x)
})
mtcars.lm <- lapply(mtcars.split, function(x){
lm(disp ~ wt, data = x)
})
mtcars.fitted <- mapply(x = mtcars.lm, y = mtcars.split, function(x, y){
newdata = data.frame(wt = seq(min(y$wt), max(y$wt), length.out = 500))
fitted <- as.data.frame(predict(x, newdata = newdata, se = T))
return(fitted)
}, SIMPLIFY = F)
lapply(mtcars.fitted, nrow)
#> $`4`
#> [1] 500
#>
#> $`6`
#> [1] 500
#>
#> $`8`
#> [1] 500
lapply(mtcars.split, nrow)
#> $`4`
#> [1] 11
#>
#> $`6`
#> [1] 7
#>
#> $`8`
#> [1] 14
mtcars.lm.all <- lm(disp ~ wt, data = mtcars)
newdata <- data.frame(wt = seq(min(mtcars$wt), max(mtcars$wt), length.out = 500))
nrow(as.data.frame(predict(mtcars.lm.all, newdata = newdata, se = T)))
#> [1] 500
Created on 2020-07-22 by the reprex package (v0.3.0)
Upvotes: 3