Alborz Gharabaghi
Alborz Gharabaghi

Reputation: 11

How do I train a classifier with multi-label data?

I am trying to train a classifier to take in a news headline as input, and output tags that fit the following headline. My data contains a bunch of news headlines as the input variables and meta-tags for those headlines as the output variables.

I One-Hot_Encoded both the headlines and their corresponding meta-tags into two separate CSV's. I then combined them into one large data frame with the X_train values being a 5573x958 numpy array for the headline words, and the y_train values being a 5573x843 numpy array.

Here is the following image of a pandas data-frame containing my data in One-Hot-Encoded form.

The goal of my classifier is for me to feed in a headline and have the most related tags to that headline as the output. The problem I have is the following.

X_train = train_set.iloc[:, :958].values
X_train.shape
(out) (5573, 958)

y_train = train_set.iloc[:, 958:].values
y_train.shape
(out) (5573, 843)
from sklearn.naive_bayes import MultinomialNB
nb_clf = MultinomialNB().fit(X_train, y_train)

When I train it using a naive-bayes classifier, I get the following error message:

bad input shape (5573, 843)

From what I researched, the only way I can have a multi-label target values is by One-Hot-Encoding them as when I tried LabelEncoder() or MultiLabelBinarizer() I had to specify the name of each column to be binarized and when I have over 800 columns (words) to specify, I could not figure out how do it. So I just One-Hot-Encoded them which I believe gives the same result, just the classifier doesn't like it as input. Any suggestions on how I can fix this?

Upvotes: 0

Views: 183

Answers (1)

Khelifi Aymen
Khelifi Aymen

Reputation: 132

You can use the Multi target classification of Sklearn. Here is an example :

from sklearn.multioutput import MultiOutputClassifier
from sklearn.naive_bayes import MultinomialNB
nb_clf = MultiOutputClassifier(MultinomialNB()).fit(X_train, y_train)

You can see the documentation from this link sklearn.multioutput.MultiOutputClassifier

Upvotes: 1

Related Questions