SecretIndividual
SecretIndividual

Reputation: 2529

Preparing data with dplyr gives: NAs introduced by coercion

I am following along with a book on building decision trees and thought I could make a piece of code a bit prettier. This is the code in question:

library(tree)
library(ISLR)
library(dplyr)

attach(Carseats)

High=ifelse(Sales <=8,"No","Yes ")
Carseats =data.frame(Carseats ,High)
tree.carseats <- tree(High~ . -Sales, Carseats)

What the code does is that it adds a column to the Carseats data frame before making a tree structure.

The code I thought would be prettier to read is:

library(tree)
library(ISLR)
library(dplyr)

Carseats <- Carseats %>% mutate(High = ifelse(Sales <= 8, "No", "Yes"))
tree.carseats <- tree(High~ . -Sales, Carseats)

However trying to run the last line with the altered code gives the warning:

Warning message:
In tree(High ~ . - Sales, Carseats) : NAs introduced by coercion

When I try to do a summary of the tree.carseats it throws an error with the modified code:

Error in y - frame$yval[object$where] : 
  non-numeric argument to binary operator

What is wrong with my thinking process here?

Upvotes: 0

Views: 697

Answers (1)

knytt
knytt

Reputation: 593

Not sure where the problem originated, but it is solved if you call factor on the result of if_else...

In general, it is not recommended to attach the data directly, this might lead to unpredictable behavior.

library(tree)
library(ISLR)
library(dplyr)

data("Carseats")

Carseats <- Carseats %>% mutate(High = factor(if_else(Sales <= 8, "No", "Yes")))

tree.carseats <- tree(High~ . -Sales, data = Carseats)

Upvotes: 1

Related Questions