DanD
DanD

Reputation: 53

set.seed() in quanteda's lda function

Each time I run this code, I get a different result:

set.seed(42)
lda_seq <- textmodel_lda(dfmt, k = 5, gamma = 0.5, 
    batch_size = 0.01, auto_iter = TRUE,
    verbose = FALSE)

terms(lda_seq)

The package is from seededlda based on quanteda.

How can I get reproducible results?

I tried different seeds, but the results will be completely different each time I run the code with the same seed. Note, there is no link between the set.seed() function and seededlda package.

Upvotes: 3

Views: 98

Answers (1)

Carl
Carl

Reputation: 7540

Setting options(seededlda_threads = 1) gives reproducible results:

(It is unclear in the documentation how one can set a seed for each of the sub-processes when multi-threading so, as r2evans suggests in the comments, it may be worth raising this as a Github issue.)

library(quanteda)
library(seededlda)

options(seededlda_threads = 1)

corp <- data_corpus_moviereviews
toks <- tokens(corp, remove_punct = TRUE, remove_symbols = TRUE, 
               remove_numbers = TRUE, remove_url = TRUE)

dfmt <- dfm(toks) |> 
  dfm_remove(stopwords("en")) |>
  dfm_remove("*@*") |>
  dfm_trim(max_docfreq = 0.1, docfreq_type = "prop")

set.seed(42)
lda_seq <- textmodel_lda(dfmt, k = 5, gamma = 0.5, 
                         batch_size = 0.01, auto_iter = TRUE,
                         verbose = FALSE)

x <- terms(lda_seq)

set.seed(42)
lda_seq <- textmodel_lda(dfmt, k = 5, gamma = 0.5, 
                         batch_size = 0.01, auto_iter = TRUE,
                         verbose = FALSE)

y <- terms(lda_seq)

waldo::compare(x, y)
#> ✔ No differences

Created on 2024-03-30 with reprex v2.1.0

Upvotes: 4

Related Questions