Reputation: 53
Each time I run this code, I get a different result:
set.seed(42)
lda_seq <- textmodel_lda(dfmt, k = 5, gamma = 0.5,
batch_size = 0.01, auto_iter = TRUE,
verbose = FALSE)
terms(lda_seq)
The package is from seededlda based on quanteda.
How can I get reproducible results?
I tried different seeds, but the results will be completely different each time I run the code with the same seed. Note, there is no link between the set.seed() function and seededlda package.
Upvotes: 3
Views: 98
Reputation: 7540
Setting options(seededlda_threads = 1)
gives reproducible results:
(It is unclear in the documentation how one can set a seed for each of the sub-processes when multi-threading so, as r2evans suggests in the comments, it may be worth raising this as a Github issue.)
library(quanteda)
library(seededlda)
options(seededlda_threads = 1)
corp <- data_corpus_moviereviews
toks <- tokens(corp, remove_punct = TRUE, remove_symbols = TRUE,
remove_numbers = TRUE, remove_url = TRUE)
dfmt <- dfm(toks) |>
dfm_remove(stopwords("en")) |>
dfm_remove("*@*") |>
dfm_trim(max_docfreq = 0.1, docfreq_type = "prop")
set.seed(42)
lda_seq <- textmodel_lda(dfmt, k = 5, gamma = 0.5,
batch_size = 0.01, auto_iter = TRUE,
verbose = FALSE)
x <- terms(lda_seq)
set.seed(42)
lda_seq <- textmodel_lda(dfmt, k = 5, gamma = 0.5,
batch_size = 0.01, auto_iter = TRUE,
verbose = FALSE)
y <- terms(lda_seq)
waldo::compare(x, y)
#> ✔ No differences
Created on 2024-03-30 with reprex v2.1.0
Upvotes: 4