Nils_Denter
Nils_Denter

Reputation: 498

Evaluation of ldaseqmodel in gensim

is there a possibility to evaluate the dynamic model (ldaseqmodel) like the "normal" lda model in values of perplexity and topic coherence? I know that these values are printed into the logging.INFO, so another method would be to save the logging.INFO into a text file to search for these evaluation values after the simulation. If method 1 (code to evaluate ldaseqmodel) doesnt exist, is it possible to save the logging.INFO into a text file? Here is my code to generate the ldaseqmodel:

from gensim import models, corpora
import csv
import logging
logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO)

Anzahl_Topics1      = 10                

Zeitabschnitte      = [16, 19, 44, 51, 84, 122, 216, 290, 385, 441, 477, 375, 390, 408, 428, 192, 38]

TDM_dateipfad = './1gramm/TDM_1gramm_1998_2014.csv'

dateiname_corpus = "./1gramm/corpus_DTM_1gramm.mm"

dateiname1_dtm  = "./1gramm/DTM_1gramm_10.model"

ids = {} 
corpus = [] 

with open(TDM_dateipfad, newline='') as csvfile:
    reader = csv.reader(csvfile, delimiter=';', quotechar='|') 
    for rownumber, row in enumerate(reader): 
        for index, field in enumerate(row):
            if index == 0:
                if rownumber > 0:
                    ids[rownumber-1] = field 
            else:
                if rownumber == 0:
                    corpus.append([])
                else:
                    corpus[index-1].append((rownumber-1, int(field))) 

corpora.MmCorpus.serialize(dateiname_corpus, corpus)

dtm1 = models.ldaseqmodel.LdaSeqModel(corpus=corpus, time_slice = Zeitabschnitte, id2word=ids, num_topics = Anzahl_Topics1, passes=1, chunksize=10000) 
dtm1.save(dateiname1_dtm)

Upvotes: 0

Views: 1165

Answers (1)

jhl
jhl

Reputation: 691

You're asking two very different questions.

Is it possible to save the logging.INFO into a text file?

Yes. You can use this code to send your log to a file instead of the console. DEBUG level logging gives you more detailed information than INFO.

import logging
logging.basicConfig(level=logging.DEBUG, file='yourlogname.log')

You might also want to set up file handlers to have an INFO log in the console, and a DEBUG level log to a file. See the python documentation here for more info.

Is there a possibility to evaluate the DTM using perplexity and topic coherence?

Yes, use dtm_coherence - see the gensim documentation here - coherence is generally a more useful measure (in terms of "do humans understand this") than perplexity. You will have to do so for each time slice separately though. My recommendation, if you want to compare two models, say a 10- vs. 20-topic model, would be to loop over the time slices for each model, and graph the coherence scores to see if one is consistently better, for example. There is a nice tutorial in this DTM example from the gensim devs.

Upvotes: 1

Related Questions