Reputation: 187
From this results:
library(stm)
labelTopics(gadarianFit, n = 15)
Topic 1 Top Words:
Highest Prob: immigr, illeg, legal, border, will, need, worri, work, countri, mexico, life, better, nation, make, worker
FREX: border, mexico, mexican, need, concern, fine, make, better, worri, nation, deport, worker, will, econom, poor
Lift: cross, racism, happen, other, continu, concern, deport, mexican, build, fine, econom, border, often, societi, amount
Score: immigr, border, need, will, mexico, illeg, mexican, worri, concern, legal, nation, fine, worker, better, also
Topic 2 Top Words:
Highest Prob: job, illeg, tax, pay, american, take, care, welfar, crime, system, secur, social, health, cost, servic
FREX: cost, health, servic, welfar, increas, loss, school, healthcar, job, care, medic, crime, social, violenc, educ
Lift: violenc, expens, opportun, cost, healthcar, loss, increas, gang, servic, medic, health, diseas, terror, school, lose
Score: job, welfar, crime, cost, tax, care, servic, increas, health, pay, school, loss, medic, healthcar, social
Topic 3 Top Words:
Highest Prob: peopl, come, countri, think, get, english, mani, live, citizen, learn, way, becom, speak, work, money
FREX: english, get, come, mani, back, becom, like, think, new, send, right, way, just, live, peopl
Lift: anyth, send, still, just, receiv, deserv, back, new, english, mani, get, busi, year, equal, come
Score: think, peopl, come, get, english, countri, mani, speak, way, send, back, money, becom, learn, live
How is it possible to keep the results from highest propability into a dataframe with number of columns equal to the number of topic and rows equal to the number of words per topic (n = 15)
Example of expected output:
topic1 topic2 topic3
immigr job peopl
illeg illeg come
Upvotes: 0
Views: 229
Reputation: 17823
In the labelTopics
object, words are stored under prob
. So you could try something like this:
library(stm)
topics <- labelTopics(gadarianFit, n=15)
topics <- data.frame(t(topics$prob))
colnames(topics) <- paste0("topic", 1:ncol(topics))
topics
#> topic1 topic2 topic3
#> 1 immigr job peopl
#> 2 illeg illeg come
#> 3 legal tax countri
#> 4 border pay think
#> 5 will american get
#> 6 need take english
#> 7 worri care mani
#> 8 work welfar live
#> 9 countri crime citizen
#> 10 mexico system learn
#> 11 life secur way
#> 12 better social becom
#> 13 nation health speak
#> 14 make cost work
#> 15 worker servic money
Note that stm
offers several ways of selecting the most important words per topic, including "Frex", "Lift". You would simply have to change the prob
in my code to use those.
Type this to see them:
topics <- labelTopics(gadarianFit, n=15)
str(topics)
Upvotes: 1