JGLEE
JGLEE

Reputation: 21

get_coherence() function error in python while using LDA

I have a problem with using coherence model

my code is

def compute_coherence_values(dictionary, corpus, texts, limit, start, step):
    coherence_values = []
    model_list = []
    for num_topics in range(start, limit, step):
        model = gensim.models.ldamodel.LdaModel(corpus=corpus, id2word=id2word, num_topics=num_topics)
        model_list.append(model)

        coherencemodel = CoherenceModel(model=model, texts=texts, dictionary=dictionary, coherence="c_v")
        coherence_values.append(coherencemodel.get_coherence())

    return model_list, coherence_values

coherence_values = []
model_list = []

# topic number
nt = pre_nt

start_ = nt;
limit_ = nt + 1;
step_ = 1;

model_list1, coherence_values1 = compute_coherence_values(dictionary=id2word, corpus=corpus, texts=texts_wi_new,
                                                        start=start_, limit=limit_, step=step_)

and the error is

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\lee96\AppData\Local\Programs\Python\Python37\Lib\multiprocessing\spawn.py", line 105, in spawn_main
Traceback (most recent call last):
  File "<input>", line 3, in <module>
  File "<input>", line 92, in compute_coherence_values
  File "D:\All Python\venv\lib\site-packages\gensim\models\coherencemodel.py", line 609, in get_coherence
    confirmed_measures = self.get_coherence_per_topic()
  File "D:\All Python\venv\lib\site-packages\gensim\models\coherencemodel.py", line 569, in get_coherence_per_topic
    self.estimate_probabilities(segmented_topics)
  File "D:\All Python\venv\lib\site-packages\gensim\models\coherencemodel.py", line 541, in estimate_probabilities
    self._accumulator = self.measure.prob(**kwargs)
  File "D:\All Python\venv\lib\site-packages\gensim\topic_coherence\probability_estimation.py", line 156, in p_boolean_sliding_window
    return accumulator.accumulate(texts, window_size)
  File "D:\All Python\venv\lib\site-packages\gensim\topic_coherence\text_analysis.py", line 444, in accumulate
    workers, input_q, output_q = self.start_workers(window_size)
  File "D:\All Python\venv\lib\site-packages\gensim\topic_coherence\text_analysis.py", line 478, in start_workers
    worker.start()
  File "C:\Users\lee96\AppData\Local\Programs\Python\Python37\Lib\multiprocessing\process.py", line 112, in start
    self._popen = self._Popen(self)
  File "C:\Users\lee96\AppData\Local\Programs\Python\Python37\Lib\multiprocessing\context.py", line 223, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "C:\Users\lee96\AppData\Local\Programs\Python\Python37\Lib\multiprocessing\context.py", line 322, in _Popen
    return Popen(process_obj)
  File "C:\Users\lee96\AppData\Local\Programs\Python\Python37\Lib\multiprocessing\popen_spawn_win32.py", line 89, in __init__
    reduction.dump(process_obj, to_child)
  File "C:\Users\lee96\AppData\Local\Programs\Python\Python37\Lib\multiprocessing\reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
BrokenPipeError: [Errno 32] Broken pipe
    exitcode = _main(fd)
  File "C:\Users\lee96\AppData\Local\Programs\Python\Python37\Lib\multiprocessing\spawn.py", line 114, in _main
    prepare(preparation_data)
  File "C:\Users\lee96\AppData\Local\Programs\Python\Python37\Lib\multiprocessing\spawn.py", line 225, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "C:\Users\lee96\AppData\Local\Programs\Python\Python37\Lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path
    run_name="__mp_main__")
  File "C:\Users\lee96\AppData\Local\Programs\Python\Python37\Lib\runpy.py", line 261, in run_path
    code, fname = _get_code_from_file(run_name, path_name)
  File "C:\Users\lee96\AppData\Local\Programs\Python\Python37\Lib\runpy.py", line 231, in _get_code_from_file
    with open(fname, "rb") as f:
OSError: [Errno 22] Invalid argument: 'D:\\All Python\\<input>'

The error occurs in this part

coherencemodel.get_coherence()

I use pycharm. How can I solve it?

sorry It looks like your post is mostly code; please add some more details. It looks like your post is mostly code; please add some more details. It looks like your post is mostly code; please add some more details.

Upvotes: 2

Views: 1678

Answers (2)

Jamie
Jamie

Reputation: 11

I did some more researching on this and found a few other articles that were helpful for me, but ultimately it seems like the errors have to do with multiprocessing within a windows framework.

where to put freeze_support() in a Python script? https://docs.python.org/2/library/multiprocessing.html#windows

What worked for me, is that I placed all of my code under the following line of code:

if __name__ == '__main__':
    freeze_support()  
    model_list, coherence_values = compute_coherence_values(dictionary=dictionary, corpus=corpus, texts=texts, start=start, limit=limit, step=step)
    max_value = max(coherence_values)
    max_index = coherence_values.index(max_value)

    best_model = model_list[max_index]

    ldamodel= best_model

I'm not the greatest developer within Python, but I got it working for what I needed. If others have better suggestions, I'm all eyes and ears :)

Upvotes: 0

Jamie
Jamie

Reputation: 11

I'm having the same exact issue with the same exact code. The code works perfectly fine when I run it from my Spyder IDE, but when I plug it into Power BI, it errors out. So far, I've broken it out of the function and loop into the basic lines below. The LDA and Coherence model runs fine, but for some reason when get_coherence() is called it errors out.

model = gensim.models.ldamodel.LdaModel(corpus, num_topics=5, id2word=dictionary, passes=10)

coherencemodel = CoherenceModel(model=model, texts=texts, dictionary=dictionary, coherence='c_v')

test = coherencemodel.get_coherence()

Below is part of the error message I received back:

RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase.

This probably means that you are not using fork to start your child processes and you have forgotten to use the proper idiom in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

The "freeze_support()" line can be omitted if the program is not going to be frozen to produce an executable.

Details: DataSourceKind=Python DataSourcePath=Python Message=Python script error.

Upvotes: 1

Related Questions