ajax
ajax

Reputation: 131

Multiclass text classification using R

I am working on a multiclass text classification problem. I have build a gradient boosting model for the same.

About the dataset:

The dataset has two columns: "Test_name" and "Description"

There are six labels in the Test_Name column and their corresponding description in the "Description" column.

My approach towards the problem

DATA PREPARATION

  1. Creat a word vector for description.

  2. Build a corpus using the word vector.

  3. Pre-processing tasks such as removing number, whitespaces, stopwords and conversion to lower case.

  4. Build a document term matrix (dtm).

  5. Remove sparse words from the above dtm.

  6. The above step leads to a count frequency matrix showing the frequency of each word in its coressponding column.

  7. Tranform count frequency matrix to a binary instance matrix, which shows occurences of a word in a document as either 0 or 1, 1 for being present and 0 for absent.

  8. Append the label column from the original notes dataset with the transformed dtm. The label column has 6 labels.

Model Building

Using H2o package, build a gbm model.

Results obtained

Four of the class labels are classified well but the rest two are poorly classified.

below is the output:

Extract training frame with `h2o.getFrame("train")`
MSE: (Extract with `h2o.mse`) 0.1197392
RMSE: (Extract with `h2o.rmse`) 0.3460335
Logloss: (Extract with `h2o.logloss`) 0.3245868
Mean Per-Class Error: 0.3791268
Confusion Matrix: Extract with `h2o.confusionMatrix(<model>,train = TRUE)`)

Body Fluid Analysis =   401 / 2,759
Cytology Test       =   182 / 1,087
Diagnostic Imaging  =   117 / 3,907
Doctors Advice      =      32 / 752
Organ Function Test =     461 / 463
Patient Related     =     101 / 113
Totals              = 1,294 / 9,081

The misclassification errors for organ function test and patient related are relatively higher. How can i fix this?

Upvotes: 0

Views: 1749

Answers (1)

Sam Abbott
Sam Abbott

Reputation: 466

Just some quick things you can do to improve this:

If you provide more details and a working example there is more that can be done to help you.

Upvotes: 0

Related Questions