Reputation: 9752
Hi I am using the following r script to build a random forest:
# load the necessary libraries
library(randomForest)
testPP<-numeric()
# load the dataset
QdataTrain <- read.csv('train.csv',header = FALSE)
QdataTest <- read.csv('test.csv',header = FALSE)
QdataTrainX <- subset(QdataTrain,select=-V1)
QdataTrainY<-as.factor(QdataTrain$V1)
QdataTestX <- subset(QdataTest,select=-V1)
QdataTestY<-as.factor(QdataTest$V1)
mdl <- randomForest(QdataTrainX, QdataTrainY)
where I am getting the following error:
Error in randomForest.default(QdataTrainX, QdataTrainY) :
NA not permitted in predictors
however i see no occurence of NA in my data.
for reference here is my data:
https://docs.google.com/file/d/0B0iDswLYaZ0zUFFsT01BYlRZU0E/edit
does anyone know why this error is being thrown? I'll keep looking in the mean time. Thanks in advance for any help!
Upvotes: 3
Views: 11590
Reputation: 7130
The given data does contain some missing values (7 in particular):
sapply(QdataTrainX, function(x) sum(is.na(x)))
## V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16
## 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## V17 V18 V19 V20 V21 V22 V23 V24 V25 V26 V27 V28 V29
## 0 0 0 0 0 0 1 1 1 1 1 1 1
Therefore columns V23 to V29 have one missing value each
which(is.na(QdataTrainX$V23))
## 318
Gives the row number for that.
Upvotes: 5