Reputation: 1225
Currently, I am working on the sklearn.pipeline which is just wonderful Here is an example:
model = make_pipeline(TfidfVectorizer(), MultinomialNB())
model.fit(train.data, train.target)
labels = model.predict(test.data)
(*data is from train = fetch_20newsgroups(subset='train', categories=categories
))
with categories= ['talk.religion.misc', 'soc.religion.christian', 'sci.space','comp.graphics']
However, my understanding is just still very vague. I would like to ask that if we do it step by step without pipeline how it could be. Here is just what I am trying to do but it failed.
from sklearn.datasets import fetch_20newsgroups
Categories = ['talk.religion.misc', 'soc.religion.christian', 'sci.space','comp.graphics']
train = fetch_20newsgroups(subset='train', categories=categories)`
from sklearn.feature_extraction.text import TfidfVectorizer
model1=TfidfVectorizer()
X=model1.fit_transform(train.data)
from sklearn.naive_bayes import MultinomialNB
model2=MultinomialNB
model2.fit(....)
At this far, I just don't know what to do next because the shape of X
is not suitable for model2
.
For your further information of this, go to the book from this link at page (406/548)
*** Please pardon for my silly question. I know I can do it by using pipeline but just want to try
Upvotes: 2
Views: 392
Reputation: 16966
You are almost there! you need to use MultinomialNB()
instead of MultinomialNB
.
Try the following procedure.
from sklearn.datasets import fetch_20newsgroups
Categories = ['talk.religion.misc', 'soc.religion.christian', 'sci.space','comp.graphics']
train = fetch_20newsgroups(subset='train', categories=categories)
from sklearn.feature_extraction.text import TfidfVectorizer
model1=TfidfVectorizer()
X=model1.fit_transform(train.data)
from sklearn.naive_bayes import MultinomialNB
model2=MultinomialNB()
model2.fit(X, train.target)
model2.predict(model1.transform(test.data))
# array([2, 1, 1, ..., 2, 1, 1])
Upvotes: 2