Reputation: 21
I have a problem with using coherence model
my code is
def compute_coherence_values(dictionary, corpus, texts, limit, start, step):
coherence_values = []
model_list = []
for num_topics in range(start, limit, step):
model = gensim.models.ldamodel.LdaModel(corpus=corpus, id2word=id2word, num_topics=num_topics)
model_list.append(model)
coherencemodel = CoherenceModel(model=model, texts=texts, dictionary=dictionary, coherence="c_v")
coherence_values.append(coherencemodel.get_coherence())
return model_list, coherence_values
coherence_values = []
model_list = []
# topic number
nt = pre_nt
start_ = nt;
limit_ = nt + 1;
step_ = 1;
model_list1, coherence_values1 = compute_coherence_values(dictionary=id2word, corpus=corpus, texts=texts_wi_new,
start=start_, limit=limit_, step=step_)
and the error is
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\Users\lee96\AppData\Local\Programs\Python\Python37\Lib\multiprocessing\spawn.py", line 105, in spawn_main
Traceback (most recent call last):
File "<input>", line 3, in <module>
File "<input>", line 92, in compute_coherence_values
File "D:\All Python\venv\lib\site-packages\gensim\models\coherencemodel.py", line 609, in get_coherence
confirmed_measures = self.get_coherence_per_topic()
File "D:\All Python\venv\lib\site-packages\gensim\models\coherencemodel.py", line 569, in get_coherence_per_topic
self.estimate_probabilities(segmented_topics)
File "D:\All Python\venv\lib\site-packages\gensim\models\coherencemodel.py", line 541, in estimate_probabilities
self._accumulator = self.measure.prob(**kwargs)
File "D:\All Python\venv\lib\site-packages\gensim\topic_coherence\probability_estimation.py", line 156, in p_boolean_sliding_window
return accumulator.accumulate(texts, window_size)
File "D:\All Python\venv\lib\site-packages\gensim\topic_coherence\text_analysis.py", line 444, in accumulate
workers, input_q, output_q = self.start_workers(window_size)
File "D:\All Python\venv\lib\site-packages\gensim\topic_coherence\text_analysis.py", line 478, in start_workers
worker.start()
File "C:\Users\lee96\AppData\Local\Programs\Python\Python37\Lib\multiprocessing\process.py", line 112, in start
self._popen = self._Popen(self)
File "C:\Users\lee96\AppData\Local\Programs\Python\Python37\Lib\multiprocessing\context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:\Users\lee96\AppData\Local\Programs\Python\Python37\Lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "C:\Users\lee96\AppData\Local\Programs\Python\Python37\Lib\multiprocessing\popen_spawn_win32.py", line 89, in __init__
reduction.dump(process_obj, to_child)
File "C:\Users\lee96\AppData\Local\Programs\Python\Python37\Lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
BrokenPipeError: [Errno 32] Broken pipe
exitcode = _main(fd)
File "C:\Users\lee96\AppData\Local\Programs\Python\Python37\Lib\multiprocessing\spawn.py", line 114, in _main
prepare(preparation_data)
File "C:\Users\lee96\AppData\Local\Programs\Python\Python37\Lib\multiprocessing\spawn.py", line 225, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "C:\Users\lee96\AppData\Local\Programs\Python\Python37\Lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path
run_name="__mp_main__")
File "C:\Users\lee96\AppData\Local\Programs\Python\Python37\Lib\runpy.py", line 261, in run_path
code, fname = _get_code_from_file(run_name, path_name)
File "C:\Users\lee96\AppData\Local\Programs\Python\Python37\Lib\runpy.py", line 231, in _get_code_from_file
with open(fname, "rb") as f:
OSError: [Errno 22] Invalid argument: 'D:\\All Python\\<input>'
The error occurs in this part
coherencemodel.get_coherence()
I use pycharm. How can I solve it?
sorry It looks like your post is mostly code; please add some more details. It looks like your post is mostly code; please add some more details. It looks like your post is mostly code; please add some more details.
Upvotes: 2
Views: 1678
Reputation: 11
I did some more researching on this and found a few other articles that were helpful for me, but ultimately it seems like the errors have to do with multiprocessing within a windows framework.
where to put freeze_support() in a Python script? https://docs.python.org/2/library/multiprocessing.html#windows
What worked for me, is that I placed all of my code under the following line of code:
if __name__ == '__main__':
freeze_support()
model_list, coherence_values = compute_coherence_values(dictionary=dictionary, corpus=corpus, texts=texts, start=start, limit=limit, step=step)
max_value = max(coherence_values)
max_index = coherence_values.index(max_value)
best_model = model_list[max_index]
ldamodel= best_model
I'm not the greatest developer within Python, but I got it working for what I needed. If others have better suggestions, I'm all eyes and ears :)
Upvotes: 0
Reputation: 11
I'm having the same exact issue with the same exact code. The code works perfectly fine when I run it from my Spyder IDE, but when I plug it into Power BI, it errors out. So far, I've broken it out of the function and loop into the basic lines below. The LDA and Coherence model runs fine, but for some reason when get_coherence() is called it errors out.
model = gensim.models.ldamodel.LdaModel(corpus, num_topics=5, id2word=dictionary, passes=10)
coherencemodel = CoherenceModel(model=model, texts=texts, dictionary=dictionary, coherence='c_v')
test = coherencemodel.get_coherence()
Below is part of the error message I received back:
RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your child processes and you have forgotten to use the proper idiom in the main module:
if __name__ == '__main__': freeze_support() ...
The "freeze_support()" line can be omitted if the program is not going to be frozen to produce an executable.
Details: DataSourceKind=Python DataSourcePath=Python Message=Python script error.
Upvotes: 1