Reputation: 5563
I have a working SVM and the CountVectorizer works fine when the input to the transform
function is a list of strings. However, if I just pass one string to it, the vectorizer iterates through each character in the string and vectorizes each one, even though I set the analyzer
parameter to word
when constructing the CountVectorizer
.
for x in range(0,3):
test=raw_input("Type a message to classify: ")
v=vectorizer.transform(test).toarray()
print(v)
print(len(v))
print(svm.predict(vectorizer.transform(test).toarray()))
I'm able to fix this issue by changing the second line in the above code to:
test=[raw_input("Type a message to classify: ")]
But this seems strange to have a 1-item list. Isn't there a better way to do this without constructing a list?
Upvotes: 1
Views: 1429
Reputation: 9405
It expects a list or array of documents so when you pass in a single string it assumes that each element of that string is a document (ie: a character).
Try changing svm.predict(vectorizer.transform(test).toarray())
to svm.predict(vectorizer.transform([test]).toarray())
PS: The toarray()
part is not going to scale well as you use a real-world corpus. SVMs in sklearn can operate on sparse matrices so I'd drop that part all together.
Upvotes: 2