Dario
Dario

Reputation: 371

Add Column of Predicted Values to Data Frame with dplyr

I have a data frame with a column of models and I am trying to add a column of predicted values to it. A minimal example is :

exampleTable <- data.frame(x = c(1:5, 1:5),
                           y = c((1:5) + rnorm(5), 2*(5:1)),
                           groups = rep(LETTERS[1:2], each = 5))
                           
models <- exampleTable %>% group_by(groups) %>% do(model = lm(y ~ x, data = .))
exampleTable <- left_join(tbl_df(exampleTable), models)

estimates <- exampleTable %>% rowwise() %>% do(Est = predict(.$model, newdata = .["x"]))

How can I add a column of numeric predictions to exampleTable? I tried using mutate to directly add the column to the table without success.

exampleTable <- exampleTable %>% rowwise() %>% mutate(data.frame(Pred = predict(.$model, newdata = .["x"])))

Error: no applicable method for 'predict' applied to an object of class "list"

Now I use bind_cols to add the estimates to exampleTable but I am looking for a better solution.

estimates <- exampleTable %>% rowwise() %>% do(data.frame(Pred = predict(.$model, newdata = .["x"])))
exampleTable <- bind_cols(exampleTable, estimates)

How can it be done in a single step?

Upvotes: 11

Views: 15226

Answers (3)

takje
takje

Reputation: 2800

Using modelr, there is an elegant solution using the tidyverse.

The inputs

library(dplyr)
library(purrr)
library(tidyr)

# generate the inputs like in the question
example_table <- data.frame(x = c(1:5, 1:5),
                            y = c((1:5) + rnorm(5), 2*(5:1)),
                            groups = rep(LETTERS[1:2], each = 5))

models <- example_table %>% 
  group_by(groups) %>% 
  do(model = lm(y ~ x, data = .)) %>%
  ungroup()
example_table <- left_join(tbl_df(example_table ), models, by = "groups")

The solution

# generate the extra column
example_table %>%
  group_by(groups) %>%
  do(modelr::add_predictions(., first(.$model)))

The explanation

add_predictions adds a new column to a data frame using a given model. Unfortunately it only takes one model as an argument. Meet do. Using do, we can run add_prediction individually over each group.

. represents the grouped data frame, .$model the model column and first() takes the first model of each group.

Simplified

With only one model, add_predictions works very well.

# take one of the models
model <- example_table$model[[6]]

# generate the extra column
example_table %>%
  modelr::add_predictions(model)

Recipes

Nowadays, the tidyverse is shifting from the modelr package to recipes so that might be the new way to go once this package matures.

Upvotes: 11

Italo Cegatta
Italo Cegatta

Reputation: 430

Using the tidyverse:

library(dplyr)
library(purrr)
library(tidyr)
library(broom)

exampleTable <- data.frame(
  x = c(1:5, 1:5),
  y = c((1:5) + rnorm(5), 2*(5:1)),
  groups = rep(LETTERS[1:2], each = 5)
)

exampleTable %>% 
  group_by(groups) %>%
  nest() %>% 
  mutate(model = data %>% map(~lm(y ~ x, data = .))) %>% 
  mutate(Pred = map2(model, data, predict)) %>% 
  unnest(Pred, data)

# A tibble: 10 × 4
   groups      Pred     x          y
   <fctr>     <dbl> <int>      <dbl>
1       A  1.284185     1  0.9305908
2       A  1.909262     2  1.9598293
3       A  2.534339     3  3.2812002
4       A  3.159415     4  2.9283637
5       A  3.784492     5  3.5717085
6       B 10.000000     1 10.0000000
7       B  8.000000     2  8.0000000
8       B  6.000000     3  6.0000000
9       B  4.000000     4  4.0000000
10      B  2.000000     5  2.0000000

Upvotes: 9

bramtayl
bramtayl

Reputation: 4024

Eh, this is only slightly better:

answer = 
  exampleTable %>%
  group_by(groups) %>%
  do(lm( y ~ x , data = .) %>% 
       predict %>% 
       data_frame(prediction = .)) %>%
  bind_cols(exampleTable)

I was hoping this would work but it didn't.

answer = 
  exampleTable %>%
  group_by(groups) %>%
  mutate(prediction = 
           lm( y ~ x , data = .) %>% 
           predict)

Upvotes: 1

Related Questions