Meaning of a statement in r?

Question

I am trying to debug a code in R in order to understand it. The statements are as follows:

library(rpart)
X = read.csv("strange_binary.csv");
fit  = rpart(c ~ X + X.1 + X.2 + X.3 + X.4 + X.5 + X.6 + X.7 + X.8 + X.9, method ="class",data=X,minbucket=1,cp=.04);
printcp(fit);
fit = prune(fit,cp=.04);

pred = predict(fit,X[,1:10],type="vector")      # test the classifier on the training data
pred[pred == 2] = "bad"
pred[pred == 1] = "good"

The aim is to build a classifier and to test it on the training data. However, I do not understand the statements:

pred[pred == 2] = "bad"
pred[pred == 1] = "good"

pred==2 and pred==1 would be either TRUE or FALSE - how is it being used to index a vector? Sorry for my naive question, I am from a C++ background and taking baby steps in R.

Thanks for your help!

sconfluentus · Accepted Answer

This is a way of saying: Assign the value "bad" to the subset of pred where pred is equal to 2

pred[pred == 2] = "bad"

Assign the value "good" to the subset of pred where pred is equal to 1

pred[pred == 1] = "good"

A more R-like way of assigning values would look like this:

pred[pred == 2] <- "bad"
pred[pred == 1] <- "good"

So it creates classes based on the logic of pred being equal to one or the other of those two values.

EDIT:

Because you asked in the comment what it is as well. I would recommend executing your code above a single line at a time. At each stage you can see what has changed by using: str() to see the structure of your new variable. It will give you dimensions, and types for the data with a few examples.

str(fit)
str(pred)

It will help you get a feel for what is occurring at each step.

Meaning of a statement in r?

Answers (1)

Related Questions