How to add more features in multi text classification?

Question

I have a retail dataset with product_description, price, supplier, category as columns. I used product_description as feature:

from sklearn import model_selection, preprocessing, naive_bayes

# split the dataset into training and validation datasets 
train_x, valid_x, train_y, valid_y = model_selection.train_test_split(df['product_description'], df['category'])

# label encode the target variable 
encoder = preprocessing.LabelEncoder()
train_y = encoder.fit_transform(train_y)
valid_y = encoder.fit_transform(valid_y)

tfidf_vect = TfidfVectorizer(analyzer='word', token_pattern=r'\w{1,}', max_features=5000)
tfidf_vect.fit(df['product_description'])
xtrain_tfidf =  tfidf_vect.transform(train_x)
xvalid_tfidf =  tfidf_vect.transform(valid_x)

classifier = naive_bayes.MultinomialNB().fit(xtrain_tfidf, train_y)

# predict the labels on validation dataset
predictions = classifier.predict(xvalid_tfidf)
metrics.accuracy_score(predictions, valid_y) # ~20%, very low

Since the accuracy is very low, I want to add the supplier and price as features too. How can I incorporate this in the code?

I have tried other classifiers like LR, SVM, and Random Forrest, but they had (almost) the same outcome.

How to add more features in multi text classification?

Answers (1)

Related Questions