How to perform LSA on a huge dataset that does not fit into memory with Python?

Question

I have similar questions before but I haven't found a solution that works for me specifically. So I have a million documents and lets say each document has around 20-30 words in it. I want to lemmatize, remove stopwords and use 100,000 words to build a tf-idf matrix and then do SVD on it. How can I do this using Python within reasonable time and without running into memory errors ?

If someone has any idea that would be great.

How to perform LSA on a huge dataset that does not fit into memory with Python?

Answers (1)

Related Questions