Reputation:
I am trying to debug a code in R
in order to understand it. The statements are as follows:
library(rpart)
X = read.csv("strange_binary.csv");
fit = rpart(c ~ X + X.1 + X.2 + X.3 + X.4 + X.5 + X.6 + X.7 + X.8 + X.9, method ="class",data=X,minbucket=1,cp=.04);
printcp(fit);
fit = prune(fit,cp=.04);
pred = predict(fit,X[,1:10],type="vector") # test the classifier on the training data
pred[pred == 2] = "bad"
pred[pred == 1] = "good"
The aim is to build a classifier and to test it on the training data. However, I do not understand the statements:
pred[pred == 2] = "bad"
pred[pred == 1] = "good"
pred==2
and pred==1
would be either TRUE
or FALSE
- how is it being used to index a vector? Sorry for my naive question, I am from a C++ background and taking baby steps in R.
Thanks for your help!
Upvotes: 0
Views: 121
Reputation: 4993
This is a way of saying: Assign the value "bad" to the subset of pred where pred is equal to 2
pred[pred == 2] = "bad"
Assign the value "good" to the subset of pred where pred is equal to 1
pred[pred == 1] = "good"
A more R-like way of assigning values would look like this:
pred[pred == 2] <- "bad"
pred[pred == 1] <- "good"
So it creates classes based on the logic of pred being equal to one or the other of those two values.
EDIT:
Because you asked in the comment what it is as well. I would recommend executing your code above a single line at a time. At each stage you can see what has changed by using: str()
to see the structure of your new variable. It will give you dimensions, and types for the data with a few examples.
str(fit)
str(pred)
It will help you get a feel for what is occurring at each step.
Upvotes: 1