Reputation: 77
I have got 787 documents (speech - text file). Using "textmineR" package i got the topics for the same. I have got 3 topics as below:
topic label coherence prevalence top_terms
t_1 policy 0.092 37.374 policy, inflation, monetary, rate, federal, economic
t_2 financial 0.030 37.677 financial, banks, risk, capital, market, not
t_3 community 0.004 24.949 community, federal, reserve, more, return, mortgage
Can someone please suggest how do i assign each topic to the relevant document? and create a datable for the same:
Document Number Topic
1 t_1
and so on.
Upvotes: 0
Views: 357
Reputation: 390
Glad you found the solution yourself and sorry I didn't see it sooner.
If you need to assign topics to new documents you can also use predict
.
Here's a reproducible example using your solution and predict
.
library(textmineR)
# 'mycorpus' and `newcorpus` are disjoint character vectors of documents
mycorpus <- nih_sample$ABSTRACT_TEXT
newcorpus <- nih_sample$PROJECT_TITLE
# create a document term matrix for training
dtm <- CreateDtm(mycorpus)
# train an LDA topic model
lda <- FitLdaModel(dtm, k = 10, iterations = 200, burnin = 150)
# get the topic document assignments for your training data
lda$theta
# create a new document term matrix for new documents
new_dtm <- CreateDtm(newcorpus)
# predict handles vocabulary (mis)alignment for you
new_theta <- predict(lda, new_dtm, iterations = 200, burnin = 150)
Upvotes: 2
Reputation: 77
found it, one can use the theta matrix generated as a result of fitLDAmodel. that is the significance of each topic in each speech(document).
Upvotes: 0