Using predict() with a vector of specific values in lapply for a list of data.frames

Question

I am trying to achieve the following. I have a list of dataframes in the form of:

list1 <- list(d1=data.frame(name=rep("d1",3), A=c(1,2,3), B=c(2,4,5)),
              d2=data.frame(name=rep("d2",3), A=c(1,2,3), B=c(2,4,5)),
              d3=data.frame(name=rep("d3",3), A=c(1,2,3), B=c(2,4,5)))

For each dataframe in list1, i want to fit a linear model and then use this model for predict(). The values to use for predictions are in a separate dataframe:

new.values <- data.frame(name=c("d1","d2","d3"), B=c(3,4,5))

Each model shall be used with only one value from new.values, the one with the corresponding name (e.g. for list$d1 the value in new.values[new.values$name == d1, ]), not for all values in new.values$B I tried this:

predictions <- lapply(list1, function(x) predict(lm(A~B, data=x), new.values[new.values$name == names(x),], interval="predict"))

But predictions remains empty:

> predictions
$d1
     fit lwr upr

$d2
     fit lwr upr

$d3
     fit lwr upr

I guess, because R doesnt find any values for predict. If i run

predictions <- lapply(list1, function(x) predict(lm(A~B, data=x), new.values, interval="predict"))

all values in new.values will be used for each model.

How can i fix this?

aosmith · Accepted Answer

The names of the individual data.frames in "list1" are the column names, not the overall name of that list item. To see this, run names(list1[[1]]).

names(list1[[1]])
"name" "A"    "B"

If you want to loop through both the list and the list names simultaneously then purrr::imap() is useful.

The anonymous function will need two arguments, which I call x and y, to refer to the list and the list names, respectively.

library(purrr)
imap(list1, function(x, y) predict(lm(A~B, data=x), new.values[new.values$name == y,], 
                                   interval="predict")) 
$d1
       fit      lwr      upr
1 1.571429 -2.48742 5.630277

$d2
       fit      lwr      upr
2 2.214286 -1.74179 6.170362

$d3
       fit       lwr      upr
3 2.857143 -1.589103 7.303388

If your prediction values are also stored in a list, purrr::map2() would be useful for looping through two lists simultaneously.

To show this I'll split the "new.values" object into a list. I can then loop through the two lists (of equal length) via map2(). I use the formula notation here, where .x refers to the first list and .y to the second instead of writing an anonymous function.

new.val.list = split(new.values, new.values$name)
map2(list1, new.val.list, ~predict(lm(A~B, data=.x), .y, 
                                 interval="predict"))
$d1
       fit      lwr      upr
1 1.571429 -2.48742 5.630277

$d2
       fit      lwr      upr
2 2.214286 -1.74179 6.170362

$d3
       fit       lwr      upr
3 2.857143 -1.589103 7.303388

Using predict() with a vector of specific values in lapply for a list of data.frames

Answers (1)

Related Questions