Multiclass classification with Naive Bayes and R

So I am trying to classify documents bases on its texts with Naive Bayes. Each document might belong to 1 to n categories (think of it as tags in a blog post).

My current approach is to provide R with a csv looking like this

+-------------------------+---------+-------+-------+
|    TEXT TO CLASSIFY     | Tag 1   | Tag 2 | Tag 3 |
+-------------------------+---------+-------+-------+
| Some text goes here     | Yes     | No    | No    |
+-------------------------+---------+-------+-------+
| Some other text here    | No      | Yes   | Yes   |
+-------------------------+---------+-------+-------+
| More text goes here     | Yes     | No    | Yes   |
+-------------------------+---------+-------+-------+

Of course the desired behaviour is to have an input looking like

Some new text to classify

And an output like

+------+------+-------+
| Tag 1| Tag 2| Tag 3 |
+------+------+-------+
| 0.12 | 0.75 | 0.65  |
+------+------+-------+

And then based on a certain threshold, determine whether or not the given text belongs to tags 1, 2, 3.

Now the question is, in the tutorials I have found, it looks like the input should be more like

+--------------------------+---------+
|    TEXT TO CLASSIFY      | Class   |
+--------------------------+---------+
| Some other text here     | No      |
+--------------------------+---------+
| Some other text here     | Yes     |
+--------------------------+---------+
| Some other text here     | Yes     |
+--------------------------+---------+

That is, a ROW per text per class... Then using that yes, i can train naive bayes and then use one-vs-all in order to determine which texts belongs to which tags. Question is, can I do this in a more elegant way (that is, with the training data looking like the first example I mentioned)?

One of the examples I found is http://blog.thedigitalgroup.com/rajendras/2015/05/28/supervised-learning-for-text-classification/

Upvotes: 2

Views: 3568

Answers (1)

CAFEBABE
CAFEBABE

Reputation: 4101

There are conceptually two approaches.

  1. You combine the tag into a combined tag. Then you would get the joint probability. The main drawback is the combinatorial explosion, which implies that you also need much more training data
  2. You build a individual NB model for each tag.

As always in probabilistic modelling is the question whether you assume that your tags are independent or not. In the spirit of Naive Bayes the independence assumption would be very natural. In that case 2. would be the way to go. If the independence assumption is not justified and you are afraid of the combinatorial explosion, you can use a standard Bayesian Network. If you keep certain assumptions your performance will not be impacted.

However, you could also assume a mixed a approach.

  1. You could use a Hierarchical Naive Bayes Model. If there is some logical structure in the Tags you can introduce a parent variable for the classes. Bascially you have a value tag1/tag2 if both tags occur together.
  2. The basic idea can be extended towards a latent variable you do not observer. This can be trained using a EM scheme. This will slightly impact your training performance, as you need to run the training, multiple iteration, however, will probably give you the best results.

http://link.springer.com/article/10.1007%2Fs10994-006-6136-2#/page-1

Upvotes: 1

Related Questions