Reputation: 71
I have a large news dataset containing approximately 300k news descriptions. I am applying dynamic topic modeling using gensim lda sequential model with 11 yearly time slices. The average length of each news article is around 3200 words.
I applied the model to a reduced dataset of 1500 messages and also divided them into 7 time slices to just check how much time it takes to complete.
time_slices=[200,200,200,200,200,200,200,100]
ldaseq = ldaseqmodel.LdaSeqModel(corpus=corpus, id2word=id2word, time_slice=time_slices, num_topics=5)
ldaseq.save('/content/drive/My Drive/News/lda_model_seq2010.model')
The problem i am facing is that the 1500 messages with 7 time slices took almost 80 minutes to complete the training with Colab TPU and 12GB ram. This means that the dataset of nearly 300k messages will take enormous amount of time to complete. Is there any solution to this issue as i do not have resources other than Google Colab.
Note: I tried to find some logging mechanism to show the progress bar did not find anything suitable for DTM. I will appreciate some help.
Upvotes: 1
Views: 395
Reputation: 45
Most probably the library does not support TPUs.
A little Google search showed me lack of GPU support also.
Try the same on CPUs. I think you will get a similar result.
Upvotes: 0