Text Categorization Python with pre-trained data

Question

how can i associate my tfidf matrix with a category ? for example i have the below data set

**ID**        **Text**                                     **Category**
   1     jake loves me more than john loves me               Romance
   2     july likes me more than robert loves me             Friendship
   3     He likes videogames more than baseball              Interest

once i calculate tfidf for each and every sentence by taking 'Text' column as my input, how would i be able to train the system to categorize that row of the matrix to be associated with my category above so that i would be able to reuse for my test data ?

using the above train dataset , when i pass a new sentence 'julie is a lovely person', i would like that sentence to be categorized into single or multiple pre-defined categories as above.

I have used this link Keep TFIDF result for predicting new content using Scikit for Python as my starting point to solve this issue but i was not able to understand on how to map tfidf matrix for a sentence to a category

elyase · Accepted Answer

It looks like you already vectorised the text, i.e. already converted the text to numbers so that you can use scinkit-learns classifiers. Now the next step is to train a classifier. You can follow this link. It looks like this:

Vectorization

from sklearn.feature_extraction.text import CountVectorizer
count_vect = CountVectorizer()
X_train = count_vect.fit_transform(your_text)

Train classifier

from sklearn.naive_bayes import MultinomialNB
clf = MultinomialNB().fit(X_train, y_train)

Predict on new docs:

docs_new = ['God is love', 'OpenGL on the GPU is fast']
X_new = count_vect.transform(docs_new)
predicted = clf.predict(X_new)

Text Categorization Python with pre-trained data

Answers (1)

Related Questions