The model is taking too much time to train on a large news dataset

Question

I have a large news dataset containing approximately 300k news descriptions. I am applying dynamic topic modeling using gensim lda sequential model with 11 yearly time slices. The average length of each news article is around 3200 words.

I applied the model to a reduced dataset of 1500 messages and also divided them into 7 time slices to just check how much time it takes to complete.

   time_slices=[200,200,200,200,200,200,200,100]  
   ldaseq = ldaseqmodel.LdaSeqModel(corpus=corpus, id2word=id2word, time_slice=time_slices, num_topics=5)
   ldaseq.save('/content/drive/My Drive/News/lda_model_seq2010.model')

The problem i am facing is that the 1500 messages with 7 time slices took almost 80 minutes to complete the training with Colab TPU and 12GB ram. This means that the dataset of nearly 300k messages will take enormous amount of time to complete. Is there any solution to this issue as i do not have resources other than Google Colab.

Note: I tried to find some logging mechanism to show the progress bar did not find anything suitable for DTM. I will appreciate some help.

The model is taking too much time to train on a large news dataset

Answers (1)

Related Questions