auro
auro

Reputation: 1089

R: Using the bigmemory library for classification with randomForest

has anyone been able to set up a classification (not a regressions) using randomForest AND the bigmemory library. I am aware that the 'formula approach" cannot be used and we have to resort to the "x=predictors, y=response approach". It appears that the big memory library is unable to deal with a response vector that has categorical values (its a matrix, after all). In my case, I have two levels, both represented as characters.

According to the bigmemory documentation..."A data frame will have character vectors converted to factors, and then all factors converted to numeric factor levels"

Any suggested workarounds to get randomForest classification to work with bigmemory?

#EXAMPLE to problem
library(randomForest)
library(bigmemory)
# Removing any extra objects from my workspace (just in case)
rm(list=ls())

#first small matrix
small.mat <- matrix(sample(0:1,5000,replace = TRUE),1000,5)
colnames(small.mat) <- paste("V",1:5,sep = "")
small.mat[,5] <- as.factor(small.mat[,5]) 
small.rf <- randomForest(V5 ~ .,data = small.mat, mtry=2, do.trace=100)
print(small.rf)
small.result <- matrix(0,1000,1)
small.result <- predict(small.rf, data=small.mat[,-5])

#now small dataframe Works!
small.mat <- matrix(sample(0:1,5000,replace = TRUE),1000,5)
colnames(small.mat) <- paste("V",1:5,sep = "")
small.data <- as.data.frame(small.mat)

small.data[,5] <- as.factor(small.data[,5]) 
small.rf <- randomForest(V5 ~ .,data = small.data, mtry=2, do.trace=100)
print(small.rf)
small.result <- matrix(0,1000,1)
small.result <- predict(small.rf, data=small.data[,-5])


#then big matrix Classification Does NOT Work :-(
#----------------****************************----
big.mat <- as.big.matrix(small.mat, type = "integer")
#Line below throws error, "cannot coerce class 'structure("big.matrix", package = "bigmemory")' into a data.frame"
big.rf <- randomForest(V5~.,data = big.mat, do.trace=10)

#Runs without error but only regression
big.rf <- randomForest(x = big.mat[,-5], y = big.mat[,5], mtry=2, do.trace=100)
print(big.rf)
big.result <- matrix(0,1000,1)
big.result <- predict(big.rf, data=big.mat[,-5])

Upvotes: 5

Views: 1727

Answers (1)

Nhan Vu
Nhan Vu

Reputation: 11

bigrf package may help. Currently, it supports classification with a limited number of features.

Upvotes: 1

Related Questions