Sandis Sjomkāns
Sandis Sjomkāns

Reputation: 1

Can you print more than 11 covariates for summary.estimateEffect?

I have created an stm topic model and I have issues with summary.estimateEffect, I have around 150 days, yet, it only prints 10 days for regression estimates.

parlPrevFit<- stm(document = out$documents, vocab = out$vocab, K = 0, prevalence =~s(day),
                    max.em.its = 150, data = out$meta, init.type = "Spectral")

prep<- estimateEffect(c(14, 40, 5, 41)~s(day), parlPrevFit, meta = meta, uncertainty = "Global")

summary(prep, topics = c(14, 40, 5, 41))

Topic 14 Coefficients- https://prnt.sc/105pg1a

Could anyone recommend any suggestions on how to print more than 10 days, please?

Upvotes: 0

Views: 249

Answers (1)

Julia Silge
Julia Silge

Reputation: 11643

Instead of using summary(), which you don't have much control over, load the package and use tidy() instead.

Let's walk through an example where we train a topic model on Jane Austen's novels, with the documents being each chapter:

library(tidyverse)
library(tidytext)
library(stm)
#> stm v1.3.6 successfully loaded. See ?stm for help. 
#>  Papers, resources, and other materials at structuraltopicmodel.com
library(janeaustenr)

books <- austen_books() %>%
  group_by(book) %>%
  mutate(chapter = cumsum(str_detect(text, regex("^chapter ", ignore_case = TRUE)))) %>%
  ungroup() %>%
  filter(chapter > 0) %>%
  unite(document, book, chapter, remove = FALSE)

austen_sparse <- books %>%
  unnest_tokens(word, text) %>%
  anti_join(stop_words) %>%
  count(document, word) %>%
  cast_sparse(document, word, n)
#> Joining, by = "word"

Let's train a topic model with 6 topics (there are 6 books):

topic_model <- stm(
  austen_sparse, 
  K = 6,
  init.type = "Spectral",
  verbose = FALSE
)

Let's make a data set to use in estimateEffect():

chapters <- books %>%
  group_by(document) %>% 
  summarize(text = str_c(text, collapse = " ")) %>%
  ungroup() %>%
  inner_join(books %>%
               distinct(document, book))
#> Joining, by = "document"

chapters
#> # A tibble: 269 x 3
#>    document text                                                           book 
#>    <chr>    <chr>                                                          <fct>
#>  1 Emma_1   "CHAPTER I   Emma Woodhouse, handsome, clever, and rich, with… Emma 
#>  2 Emma_10  "CHAPTER X   Though now the middle of December, there had yet… Emma 
#>  3 Emma_11  "CHAPTER XI   Mr. Elton must now be left to himself. It was n… Emma 
#>  4 Emma_12  "CHAPTER XII   Mr. Knightley was to dine with them--rather ag… Emma 
#>  5 Emma_13  "CHAPTER XIII   There could hardly be a happier creature in t… Emma 
#>  6 Emma_14  "CHAPTER XIV   Some change of countenance was necessary for e… Emma 
#>  7 Emma_15  "CHAPTER XV   Mr. Woodhouse was soon ready for his tea; and w… Emma 
#>  8 Emma_16  "CHAPTER XVI   The hair was curled, and the maid sent away, a… Emma 
#>  9 Emma_17  "CHAPTER XVII   Mr. and Mrs. John Knightley were not detained… Emma 
#> 10 Emma_18  "CHAPTER XVIII   Mr. Frank Churchill did not come. When the t… Emma 
#> # … with 259 more rows

Now let's estimate regressions from our topic model, for our first three topics and our data set of "chapter" documents:

effects <- estimateEffect(1:3 ~ book, topic_model, chapters)

summary(effects)
#> 
#> Call:
#> estimateEffect(formula = 1:3 ~ book, stmobj = topic_model, metadata = chapters)
#> 
#> 
#> Topic 1:
#> 
#> Coefficients:
#>                        Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)            0.018033   0.023726   0.760    0.448    
#> bookPride & Prejudice  0.799555   0.037140  21.528   <2e-16 ***
#> bookMansfield Park    -0.006387   0.032662  -0.196    0.845    
#> bookEmma               0.003188   0.033393   0.095    0.924    
#> bookNorthanger Abbey   0.002535   0.039017   0.065    0.948    
#> bookPersuasion         0.025725   0.044281   0.581    0.562    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> 
#> Topic 2:
#> 
#> Coefficients:
#>                        Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)            0.015289   0.016478   0.928    0.354    
#> bookPride & Prejudice  0.001785   0.023489   0.076    0.939    
#> bookMansfield Park     0.001616   0.024664   0.066    0.948    
#> bookEmma               0.892516   0.037833  23.591   <2e-16 ***
#> bookNorthanger Abbey   0.006032   0.031530   0.191    0.848    
#> bookPersuasion        -0.001142   0.030052  -0.038    0.970    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> 
#> Topic 3:
#> 
#> Coefficients:
#>                         Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)            0.0196151  0.0225115   0.871   0.3844    
#> bookPride & Prejudice -0.0004909  0.0286302  -0.017   0.9863    
#> bookMansfield Park     0.0148960  0.0341272   0.436   0.6628    
#> bookEmma              -0.0004006  0.0301741  -0.013   0.9894    
#> bookNorthanger Abbey   0.8730570  0.0457994  19.063   <2e-16 ***
#> bookPersuasion         0.1030537  0.0495148   2.081   0.0384 *  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

This example doesn't have the problem you mentioned of printing limitations, but you can avoid any problem like that by using tidy() instead where you get the actual content of the regressions out:

tidy(effects)
#> # A tibble: 18 x 6
#>    topic term                   estimate std.error statistic  p.value
#>    <int> <chr>                     <dbl>     <dbl>     <dbl>    <dbl>
#>  1     1 (Intercept)            0.0179      0.0238    0.753  4.52e- 1
#>  2     1 bookPride & Prejudice  0.799       0.0373   21.4    1.09e-59
#>  3     1 bookMansfield Park    -0.00614     0.0325   -0.189  8.50e- 1
#>  4     1 bookEmma               0.00350     0.0336    0.104  9.17e- 1
#>  5     1 bookNorthanger Abbey   0.00323     0.0394    0.0820 9.35e- 1
#>  6     1 bookPersuasion         0.0253      0.0443    0.571  5.68e- 1
#>  7     2 (Intercept)            0.0153      0.0165    0.925  3.56e- 1
#>  8     2 bookPride & Prejudice  0.00165     0.0234    0.0707 9.44e- 1
#>  9     2 bookMansfield Park     0.00167     0.0246    0.0680 9.46e- 1
#> 10     2 bookEmma               0.892       0.0381   23.4    2.84e-66
#> 11     2 bookNorthanger Abbey   0.00606     0.0317    0.191  8.49e- 1
#> 12     2 bookPersuasion        -0.00107     0.0298   -0.0359 9.71e- 1
#> 13     3 (Intercept)            0.0197      0.0228    0.864  3.89e- 1
#> 14     3 bookPride & Prejudice -0.000835    0.0288   -0.0290 9.77e- 1
#> 15     3 bookMansfield Park     0.0147      0.0342    0.428  6.69e- 1
#> 16     3 bookEmma              -0.000707    0.0305   -0.0232 9.82e- 1
#> 17     3 bookNorthanger Abbey   0.873       0.0461   18.9    4.93e-51
#> 18     3 bookPersuasion         0.103       0.0496    2.08   3.85e- 2

Created on 2021-02-26 by the reprex package (v1.0.0)

Upvotes: 1

Related Questions