Reputation: 1
I have created an stm topic model and I have issues with summary.estimateEffect, I have around 150 days, yet, it only prints 10 days for regression estimates.
parlPrevFit<- stm(document = out$documents, vocab = out$vocab, K = 0, prevalence =~s(day),
max.em.its = 150, data = out$meta, init.type = "Spectral")
prep<- estimateEffect(c(14, 40, 5, 41)~s(day), parlPrevFit, meta = meta, uncertainty = "Global")
summary(prep, topics = c(14, 40, 5, 41))
Topic 14 Coefficients- https://prnt.sc/105pg1a
Could anyone recommend any suggestions on how to print more than 10 days, please?
Upvotes: 0
Views: 249
Reputation: 11643
Instead of using summary()
, which you don't have much control over, load the tidytext package and use tidy()
instead.
Let's walk through an example where we train a topic model on Jane Austen's novels, with the documents being each chapter:
library(tidyverse)
library(tidytext)
library(stm)
#> stm v1.3.6 successfully loaded. See ?stm for help.
#> Papers, resources, and other materials at structuraltopicmodel.com
library(janeaustenr)
books <- austen_books() %>%
group_by(book) %>%
mutate(chapter = cumsum(str_detect(text, regex("^chapter ", ignore_case = TRUE)))) %>%
ungroup() %>%
filter(chapter > 0) %>%
unite(document, book, chapter, remove = FALSE)
austen_sparse <- books %>%
unnest_tokens(word, text) %>%
anti_join(stop_words) %>%
count(document, word) %>%
cast_sparse(document, word, n)
#> Joining, by = "word"
Let's train a topic model with 6 topics (there are 6 books):
topic_model <- stm(
austen_sparse,
K = 6,
init.type = "Spectral",
verbose = FALSE
)
Let's make a data set to use in estimateEffect()
:
chapters <- books %>%
group_by(document) %>%
summarize(text = str_c(text, collapse = " ")) %>%
ungroup() %>%
inner_join(books %>%
distinct(document, book))
#> Joining, by = "document"
chapters
#> # A tibble: 269 x 3
#> document text book
#> <chr> <chr> <fct>
#> 1 Emma_1 "CHAPTER I Emma Woodhouse, handsome, clever, and rich, with… Emma
#> 2 Emma_10 "CHAPTER X Though now the middle of December, there had yet… Emma
#> 3 Emma_11 "CHAPTER XI Mr. Elton must now be left to himself. It was n… Emma
#> 4 Emma_12 "CHAPTER XII Mr. Knightley was to dine with them--rather ag… Emma
#> 5 Emma_13 "CHAPTER XIII There could hardly be a happier creature in t… Emma
#> 6 Emma_14 "CHAPTER XIV Some change of countenance was necessary for e… Emma
#> 7 Emma_15 "CHAPTER XV Mr. Woodhouse was soon ready for his tea; and w… Emma
#> 8 Emma_16 "CHAPTER XVI The hair was curled, and the maid sent away, a… Emma
#> 9 Emma_17 "CHAPTER XVII Mr. and Mrs. John Knightley were not detained… Emma
#> 10 Emma_18 "CHAPTER XVIII Mr. Frank Churchill did not come. When the t… Emma
#> # … with 259 more rows
Now let's estimate regressions from our topic model, for our first three topics and our data set of "chapter" documents:
effects <- estimateEffect(1:3 ~ book, topic_model, chapters)
summary(effects)
#>
#> Call:
#> estimateEffect(formula = 1:3 ~ book, stmobj = topic_model, metadata = chapters)
#>
#>
#> Topic 1:
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 0.018033 0.023726 0.760 0.448
#> bookPride & Prejudice 0.799555 0.037140 21.528 <2e-16 ***
#> bookMansfield Park -0.006387 0.032662 -0.196 0.845
#> bookEmma 0.003188 0.033393 0.095 0.924
#> bookNorthanger Abbey 0.002535 0.039017 0.065 0.948
#> bookPersuasion 0.025725 0.044281 0.581 0.562
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#>
#> Topic 2:
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 0.015289 0.016478 0.928 0.354
#> bookPride & Prejudice 0.001785 0.023489 0.076 0.939
#> bookMansfield Park 0.001616 0.024664 0.066 0.948
#> bookEmma 0.892516 0.037833 23.591 <2e-16 ***
#> bookNorthanger Abbey 0.006032 0.031530 0.191 0.848
#> bookPersuasion -0.001142 0.030052 -0.038 0.970
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#>
#> Topic 3:
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 0.0196151 0.0225115 0.871 0.3844
#> bookPride & Prejudice -0.0004909 0.0286302 -0.017 0.9863
#> bookMansfield Park 0.0148960 0.0341272 0.436 0.6628
#> bookEmma -0.0004006 0.0301741 -0.013 0.9894
#> bookNorthanger Abbey 0.8730570 0.0457994 19.063 <2e-16 ***
#> bookPersuasion 0.1030537 0.0495148 2.081 0.0384 *
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
This example doesn't have the problem you mentioned of printing limitations, but you can avoid any problem like that by using tidy()
instead where you get the actual content of the regressions out:
tidy(effects)
#> # A tibble: 18 x 6
#> topic term estimate std.error statistic p.value
#> <int> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 1 (Intercept) 0.0179 0.0238 0.753 4.52e- 1
#> 2 1 bookPride & Prejudice 0.799 0.0373 21.4 1.09e-59
#> 3 1 bookMansfield Park -0.00614 0.0325 -0.189 8.50e- 1
#> 4 1 bookEmma 0.00350 0.0336 0.104 9.17e- 1
#> 5 1 bookNorthanger Abbey 0.00323 0.0394 0.0820 9.35e- 1
#> 6 1 bookPersuasion 0.0253 0.0443 0.571 5.68e- 1
#> 7 2 (Intercept) 0.0153 0.0165 0.925 3.56e- 1
#> 8 2 bookPride & Prejudice 0.00165 0.0234 0.0707 9.44e- 1
#> 9 2 bookMansfield Park 0.00167 0.0246 0.0680 9.46e- 1
#> 10 2 bookEmma 0.892 0.0381 23.4 2.84e-66
#> 11 2 bookNorthanger Abbey 0.00606 0.0317 0.191 8.49e- 1
#> 12 2 bookPersuasion -0.00107 0.0298 -0.0359 9.71e- 1
#> 13 3 (Intercept) 0.0197 0.0228 0.864 3.89e- 1
#> 14 3 bookPride & Prejudice -0.000835 0.0288 -0.0290 9.77e- 1
#> 15 3 bookMansfield Park 0.0147 0.0342 0.428 6.69e- 1
#> 16 3 bookEmma -0.000707 0.0305 -0.0232 9.82e- 1
#> 17 3 bookNorthanger Abbey 0.873 0.0461 18.9 4.93e-51
#> 18 3 bookPersuasion 0.103 0.0496 2.08 3.85e- 2
Created on 2021-02-26 by the reprex package (v1.0.0)
Upvotes: 1