Why is the sklearn LDA transform VERY SLOW?

Question

I'm using sickit-learn Latent Dirichlet Allocation for topic modeling. The lda_object is fitted to a corpus of text. Now, we fit that to one text to understand the topic weights for it.

def append_lda_features(df, lda_vectorizer, tfidf+vector):

    from time import time
    st = time()

    lda_vector = lda_vectorizer.transform(tfidf_vector)

    print(time() - st)

    lda_vector = pd.DataFrame(lda_vector)
    lda_vector.columns = ['lda_word_'+str(i)
                           for i in range(lda_vectorizer.n_components)]
    return pd.concat([df, lda_vector], axis=1)

This is printing values around 0.67 seconds, which is really high. Considering that my lda only contains 15 components, and vectorizer has 100000 tokens:

LatentDirichletAllocation(n_components=15, n_jobs=30, verbose=1)

What should I do to make the LDA work faster?

Why is the sklearn LDA transform VERY SLOW?

Answers (1)

Related Questions