Passing TFIDF Feature Vector to a SGDClassifier from sklearn

Question

import numpy as np
from sklearn import linear_model
X = np.array([[-1, -1], [-2, -1], [1, 1], [2, 1]])
Y = np.array(['C++', 'C#', 'java','python'])
clf = linear_model.SGDClassifier()
clf.fit(X, Y)
print (clf.predict([[1.7, 0.7]]))
#python

I am trying to predict the values from arrays Y by giving a test case and training it on a training data which is X, Now my problem is that, I want to change the training set X to TF-IDF Feature Vectors, so how can that be possible? Vaguely, I want to do something like this

import numpy as np
from sklearn import linear_model
X = np.array_str([['abcd', 'efgh'], ['qwert', 'yuiop'], ['xyz','abc'],['opi', 'iop']])
Y = np.array(['C++', 'C#', 'java','python'])
clf = linear_model.SGDClassifier()
clf.fit(X, Y)

wonderkid2 · Accepted Answer

You should look into the TfidfVectorizer in scikit-learn. I'll presume that X is a list of texts to be classified.

from sklearn.feature_extraction.text import TfidfVectorizer
vectorizer = TfidfVectorizer()
X_train = vectorizer.fit_transform(X)

And then use the X_train as the new X to train you classifier on.

clf = linear_model.SGDClassifier()
clf.fit(X_train, Y)

Passing TFIDF Feature Vector to a SGDClassifier from sklearn

Answers (1)

Related Questions