MYaseen208
MYaseen208

Reputation: 23898

Using predict function for new data along with tidyverse

I want to use predict function for new data along with tidyverse as in the following example. However, I could not figured out how to use with new data for wt = 4.0 and 4.2. Any hints, please.

library(tidyverse)    
  mtcars %>%
    dplyr::mutate(cyl1 = factor(cyl)) %>%
    tidyr::nest(-cyl) %>%
    dplyr::mutate(m1  = purrr::map(.x = data, .f = ~ lm(mpg ~ wt, data = .))) %>%
    mutate(Pred = purrr::map(.x = m1, .f = predict)) %>%
    dplyr::pull(Pred)

Upvotes: 1

Views: 2972

Answers (2)

KMK
KMK

Reputation: 21

If you follow that code,

  group_by(cyl) %>%
  nest %>%
  mutate(m1  = purrr::map(.x = data, .f = ~ lm(mpg ~ wt, data = .)))%>%
  mutate(Pred = purrr::map(.x = m1, ~ predict(., mtcars,interval="prediction"))))

The final column Pred is a list of indices [32 x 3] It seems like the lm generated for each cyl group is applied to every cyl interatively (that is the m1 for cyl 4 is applied to cyl 6,4, and 8. How do you get the lm generated for cyl 6 to only apply to (say for example) a longer dataset of only cyl 6

Upvotes: 0

missuse
missuse

Reputation: 19716

here is an example on how to fit several model by group and obtain predictions from them.

Define data to predict upon:

newdat <- data.frame(wt = c(4,4.2))
library(tidyverse)  
mtcars %>%
  group_by(cyl) %>% #group by cyl
  nest %>% #nest groups
  mutate(m1  = purrr::map(.x = data, .f = ~ lm(mpg ~ wt, data = .))) %>% #create models
  mutate(Pred = purrr::map(.x = m1, ~ predict(., newdat))) %>% #predict on new data
  pull(Pred) #pull predictions
#output
[[1]]
       1        2 
17.28842 16.73240 

[[2]]
       1        2 
16.98309 15.85369 

[[3]]
       1        2 
15.09828 14.65979 

or slightly modified:

mtcars %>%
  group_by(cyl) %>%
  nest %>%
  mutate(m1  = purrr::map(.x = data, .f = ~ lm(mpg ~ wt, data = .))) %>%
  mutate(Pred = purrr::map(.x = m1, ~ predict(., newdat))) %>%
  select(cyl, Pred) %>%
  unnest #one can add %>% cbind(newdat = newdat) to know for which wt the pred is for

#output
# A tibble: 6 x 2
    cyl  Pred
  <dbl> <dbl>
1  6.00  17.3
2  6.00  16.7
3  4.00  17.0
4  4.00  15.9
5  8.00  15.1
6  8.00  14.7

EDIT to the question in the comments

To obtain the standard errors I think it is easiest to define a custom function for predict which will return a data frame of the fit and se.fit:

pred <- function(x,  ...){
  z <- predict.lm(x, se.fit = TRUE, ...)
  as.data.frame(z[1:2])
}

mtcars %>%
  mutate(cyl1 = factor(cyl)) %>%
  group_by(cyl) %>%
  nest %>%
  mutate(m1  = purrr::map(.x = data, .f = ~ lm(mpg ~ wt, data = .))) %>%
  mutate(Pred = purrr::map(.x = m1, ~ pred(., newdat = newdat))) %>%
  select(cyl, Pred) %>%
  unnest %>%
  cbind(newdat = newdat)
#output
  cyl      fit    se.fit  wt
1   6 17.28842 1.2581400 4.0
2   6 16.73240 1.5111249 4.2
3   4 16.98309 3.3269446 4.0
4   4 15.85369 3.6813880 4.2
5   8 15.09828 0.5409614 4.0
6   8 14.65979 0.5609545 4.2

Upvotes: 4

Related Questions