rek
rek

Reputation: 187

Convert results into a dataframe from function

From this results:

library(stm)
labelTopics(gadarianFit, n = 15)

Topic 1 Top Words:
     Highest Prob: immigr, illeg, legal, border, will, need, worri, work, countri, mexico, life, better, nation, make, worker 
     FREX: border, mexico, mexican, need, concern, fine, make, better, worri, nation, deport, worker, will, econom, poor 
     Lift: cross, racism, happen, other, continu, concern, deport, mexican, build, fine, econom, border, often, societi, amount 
     Score: immigr, border, need, will, mexico, illeg, mexican, worri, concern, legal, nation, fine, worker, better, also 
Topic 2 Top Words:
     Highest Prob: job, illeg, tax, pay, american, take, care, welfar, crime, system, secur, social, health, cost, servic 
     FREX: cost, health, servic, welfar, increas, loss, school, healthcar, job, care, medic, crime, social, violenc, educ 
     Lift: violenc, expens, opportun, cost, healthcar, loss, increas, gang, servic, medic, health, diseas, terror, school, lose 
     Score: job, welfar, crime, cost, tax, care, servic, increas, health, pay, school, loss, medic, healthcar, social 
Topic 3 Top Words:
     Highest Prob: peopl, come, countri, think, get, english, mani, live, citizen, learn, way, becom, speak, work, money 
     FREX: english, get, come, mani, back, becom, like, think, new, send, right, way, just, live, peopl 
     Lift: anyth, send, still, just, receiv, deserv, back, new, english, mani, get, busi, year, equal, come 
     Score: think, peopl, come, get, english, countri, mani, speak, way, send, back, money, becom, learn, live 

How is it possible to keep the results from highest propability into a dataframe with number of columns equal to the number of topic and rows equal to the number of words per topic (n = 15)

Example of expected output:

topic1 topic2 topic3
immigr job peopl
illeg illeg come

Upvotes: 0

Views: 229

Answers (1)

Vincent
Vincent

Reputation: 17823

In the labelTopics object, words are stored under prob. So you could try something like this:

library(stm)
topics <- labelTopics(gadarianFit, n=15)

topics <- data.frame(t(topics$prob))
colnames(topics) <- paste0("topic", 1:ncol(topics))
topics
#>     topic1   topic2  topic3
#> 1   immigr      job   peopl
#> 2    illeg    illeg    come
#> 3    legal      tax countri
#> 4   border      pay   think
#> 5     will american     get
#> 6     need     take english
#> 7    worri     care    mani
#> 8     work   welfar    live
#> 9  countri    crime citizen
#> 10  mexico   system   learn
#> 11    life    secur     way
#> 12  better   social   becom
#> 13  nation   health   speak
#> 14    make     cost    work
#> 15  worker   servic   money

Note that stm offers several ways of selecting the most important words per topic, including "Frex", "Lift". You would simply have to change the prob in my code to use those.

Type this to see them:

topics <- labelTopics(gadarianFit, n=15)
str(topics)

Upvotes: 1

Related Questions