Reputation: 969
how can i associate my tfidf matrix with a category ? for example i have the below data set
**ID** **Text** **Category**
1 jake loves me more than john loves me Romance
2 july likes me more than robert loves me Friendship
3 He likes videogames more than baseball Interest
once i calculate tfidf for each and every sentence by taking 'Text' column as my input, how would i be able to train the system to categorize that row of the matrix to be associated with my category above so that i would be able to reuse for my test data ?
using the above train dataset , when i pass a new sentence 'julie is a lovely person', i would like that sentence to be categorized into single or multiple pre-defined categories as above.
I have used this link Keep TFIDF result for predicting new content using Scikit for Python as my starting point to solve this issue but i was not able to understand on how to map tfidf matrix for a sentence to a category
Upvotes: 0
Views: 799
Reputation: 40963
It looks like you already vectorised the text, i.e. already converted the text to numbers so that you can use scinkit-learns classifiers. Now the next step is to train a classifier. You can follow this link. It looks like this:
Vectorization
from sklearn.feature_extraction.text import CountVectorizer
count_vect = CountVectorizer()
X_train = count_vect.fit_transform(your_text)
Train classifier
from sklearn.naive_bayes import MultinomialNB
clf = MultinomialNB().fit(X_train, y_train)
Predict on new docs:
docs_new = ['God is love', 'OpenGL on the GPU is fast']
X_new = count_vect.transform(docs_new)
predicted = clf.predict(X_new)
Upvotes: 1