sdgaw erzswer
sdgaw erzswer

Reputation: 2382

Scikit-learn tfidf vectorizer in minibatches?

I've been trying to perform tf-idf heuristic on a large corpus.

Can I iteratively read the documents, and call the

vectorizer.fit()

In each iteration? Does this take into account only the current iteration, or does it remember the previous ones?

Thanks!

Upvotes: 3

Views: 589

Answers (1)

benbo
benbo

Reputation: 1528

The solution to your problem will depend on your particular application. You could consider gensim's tfidf implementation which is more efficient and does not need to keep the entire corpus in memory as this post explains.

Upvotes: 1

Related Questions