jalaj pathak
jalaj pathak

Reputation: 77

how to assign the topics retried via LDA in R using "textmineR" package to the specific documents

I have got 787 documents (speech - text file). Using "textmineR" package i got the topics for the same. I have got 3 topics as below:

 topic label      coherence   prevalence    top_terms
 t_1   policy     0.092       37.374        policy, inflation, monetary, rate, federal, economic
 t_2   financial  0.030       37.677        financial, banks, risk, capital, market, not
 t_3   community  0.004       24.949        community, federal, reserve, more, return, mortgage 

Can someone please suggest how do i assign each topic to the relevant document? and create a datable for the same:

Document Number          Topic
1                           t_1

and so on.

Upvotes: 0

Views: 357

Answers (2)

Tommy Jones
Tommy Jones

Reputation: 390

Glad you found the solution yourself and sorry I didn't see it sooner.

If you need to assign topics to new documents you can also use predict.

Here's a reproducible example using your solution and predict.

library(textmineR)

# 'mycorpus' and `newcorpus` are disjoint character vectors of documents
mycorpus <- nih_sample$ABSTRACT_TEXT

newcorpus <- nih_sample$PROJECT_TITLE

# create a document term matrix for training
dtm <- CreateDtm(mycorpus)

# train an LDA topic model
lda <- FitLdaModel(dtm, k = 10, iterations = 200, burnin = 150)

# get the topic document assignments for your training data
lda$theta

# create a new document term matrix for new documents
new_dtm <- CreateDtm(newcorpus)

# predict handles vocabulary (mis)alignment for you
new_theta <- predict(lda, new_dtm, iterations = 200, burnin = 150)

Upvotes: 2

jalaj pathak
jalaj pathak

Reputation: 77

found it, one can use the theta matrix generated as a result of fitLDAmodel. that is the significance of each topic in each speech(document).

Upvotes: 0

Related Questions