Reputation: 19375
Consider this example:
library(quanteda)
library(caret)
library(glmnet)
library(dplyr)
dtrain <- data_frame(text = c("Chinese Beijing Chinese",
"Chinese Chinese Shanghai",
"Chinese Macao",
"Tokyo Japan Chinese"),
doc_id = 1:4,
class = c("Y", "Y", "Y", "N"))
# now we make the dataframe bigger
dtrain <- purrr::map_df(seq_len(100), function(x) dtrain)
Lets create a sparse document-term-matrix and run some glmnet
> dtrain <- dtrain %>% mutate(class = as.factor(class))
> mycorpus <- corpus(dtrain, text_field = 'text')
> trainingdf <- dfm(mycorpus)
> trainingdf
Document-feature matrix of: 400 documents, 6 features (62.5% sparse).
And now we finally turn to the lasso model
mymodel <- cv.glmnet(x = trainingdf, y =dtrain$class,
type.measure ='class',
nfolds = 3,
alpha = 1,
parallel = FALSE,
family = 'binomial')
I have two simple questions.
How can I add the predictions to the original dtrain
data? Indeed, the mere output of
mypred <- predict.cv.glmnet(mymodel, newx = trainingdf,
s = 'lambda.min', type = 'class')
looks HORRIBLY NOT TIDY:
> mypred
1
1 "Y"
2 "Y"
3 "Y"
How can I use caret::confusionMatrix
in this setting? Just using the following creates an error:
confusion <- caret::confusionMatrix(data =mypred,
+ reference = dtrain$class)
Error: `data` and `reference` should be factors with the same levels.
Thanks!
Upvotes: 0
Views: 1522
Reputation: 8364
In every classification model the class for your target variable needs to be factor
.
For example:
my_data
is the dataset you train the model on, and my_target
is the predictor.
Note that as.factor(my_data$my_target)
will automatically find the correct levels
for you.
By this I mean that you won't need to specify the levels
by hand, but R will do it for you.
See here the difference when we call target
:
target <- c("y", "n", "y", "n")
target
#[1] "y" "n" "y" "n" # this is a simple char
as.factor(target)
# [1] y n y n
# Levels: n y # this is a correct format, a factor with levels
This is important because even if your predictions (or test data) will show only one of the two classes in the target
, the model will know that the actual levels
can be more.
You can of course set them:
my_pred <- factor(mypred, levels = c("Y", "N"))
To add them in the data, you can use
my_data$newpred <- my_pred
or
library(dplyr)
my_data %>% mutate(newpred = my_pred)
Upvotes: 2