Reputation: 1
I am working on sentiment analysis in r. i've done making a model with naive bayes. but, i wanna try another one, which is xgboost. then, i got a problem when tried to make xgboost model because don't know what to do with my document term matrix in xgboost. Can anyone give me a solution?
i've tried to convert the document term matrix data to data frame. but it doesn't seem to work.
the code below describes how my current train & test data
library(tm)
dtm.tf <- VCorpus(VectorSource(results$text)) %>%
DocumentTermMatrix()
#split 80:20
all.data <- dtm.tf
train.data <- dtm.tf[1:312,]
test.data <- dtm.tf[313:390,]
and i have xgboost template with another data set :
# install.packages('xgboost')
library(xgboost)
classifier = xgboost(data = as.matrix(training_set[-11]),
label = training_set$Exited, nrounds = 10)
# Predicting the Test set results
y_pred = predict(classifier, newdata = as.matrix(test_set[-11]))
y_pred = (y_pred >= 0.5)
# Making the Confusion Matrix
cm = table(test_set[, 11], y_pred)
i want to use the xgboost template above to make my model using my current train & test data. what i have to do?
Upvotes: 0
Views: 233
Reputation: 23608
You need to transform the document term matrix into a sparse matrix. In your case that can be done via sparseMatrix
function from the Matrix package (default with R):
sparse_matrix_tf <- Matrix::sparseMatrix(i=dtm.tf$i, j=dtm.tf$j, x=dtm.tf$v,
dims=c(dtm.tf$nrow, dtm.tf$ncol))
Then you can use this to feed it to xgboost and use the label form the dtm.tf.
classifier = xgboost(data = sparse_matrix_tf,
label = dtm.tf$dimnames$Docs,
nrounds = 10).
Complete reproducible example below. I leave the splitting into 80 / 20 to you.
library(tm)
library(xgboost)
data("crude")
crude <- as.VCorpus(crude)
dtm.tf <- DocumentTermMatrix(crude)
sparse_matrix_tf <- Matrix::sparseMatrix(i=dtm.tf$i, j=dtm.tf$j, x=dtm.tf$v,
dims=c(dtm.tf$nrow, dtm.tf$ncol))
classifier = xgboost(data = sparse_matrix_tf,
label = dtm.tf$dimnames$Docs,
nrounds = 10)
Upvotes: 0