Reputation: 1
I am trying to analyze a translation file (with English-French sentence pairs) using Bicleaner (https://github.com/bitextor/bicleaner). I have a "test corpus" with ten sentence pairs formatted as required but when I run the code to generate a classifier file I always run into the same issue.
CODE:
!bicleaner-ai-classify \
--scol 3 --tcol 4 \
corpus.en-de.tsv \
corpus.en-de.classifed.tsv \
bitextor/bicleaner-ai-full-en-fr
I always receive the following error:
2024-03-26 12:07:26,386 - INFO - Arguments processed
2024-03-26 12:07:26,387 - INFO - Starting process
Traceback (most recent call last):
File "kenlm.pyx", line 139, in kenlm.Model.__init__
RuntimeError: lm/model.cc:49 in void lm::ngram::detail::{anonymous}::CheckCounts(const std::vector<long unsigned int>&) threw FormatLoadException because `counts.size() > 6'.
This model has order 7 but KenLM was compiled to support up to 6. If your build system supports changing KENLM_MAX_ORDER, change it there and recompile. With cmake:
cmake -DKENLM_MAX_ORDER=10 ..
With Moses:
bjam --max-kenlm-order=10 -a
Otherwise, edit lm/max_order.hh.
The above exception was the direct cause of the following exception:
Traceback (most recent call last): [...]
I have tried and retried cmake -DKENLM_MAX_ORDER=10
and recompiled but it doesn't work. I have also accessed lm/max_order.hh to manually set MAX_ORDER. The previous content of that file was:
#ifndef LM_MAX_ORDER_H
#define LM_MAX_ORDER_H
/* IF YOUR BUILD SYSTEM PASSES -DKENLM_MAX_ORDER, THEN CHANGE THE BUILD SYSTEM.
* If not, this is the default maximum order.
* Having this limit means that State can be
* (kMaxOrder - 1) * sizeof(float) bytes instead of
* sizeof(float*) + (kMaxOrder - 1) * sizeof(float) + malloc overhead
*/
#ifndef KENLM_ORDER_MESSAGE
#define KENLM_ORDER_MESSAGE "If your build system supports changing KENLM_MAX_ORDER, change it there and recompile. With cmake:\n cmake -DKENLM_MAX_ORDER=10 ..\nWith Moses:\n bjam --max-kenlm-order=10 -a\nOtherwise, edit lm/max_order.hh."
#endif
#endif // LM_MAX_ORDER_H
Now it is:
#ifndef LM_MAX_ORDER_H
#define LM_MAX_ORDER_H
#ifndef KENLM_MAX_ORDER
#define KENLM_MAX_ORDER 10
#endif
#ifndef KENLM_ORDER_MESSAGE
#define KENLM_ORDER_MESSAGE "If your build system supports changing KENLM_MAX_ORDER, change it there and recompile. With cmake:\n cmake -DKENLM_MAX_ORDER=10 ..\nWith Moses:\n bjam --max-kenlm-order=10 -a\nOtherwise, edit lm/max_order.hh."
#endif
#endif // LM_MAX_ORDER_H
I recompiled afterwards by doing:
cmake ..
make -j4
Yet it doesn't work. I am working in Linux with Jupyter Notebook as a Python notebook and conda as a terminal (in a specific environment). I have basically taken all the steps listed in the error message and tried different approaches, and even after manually editing the build files to 10 instead of 6 and recompiling with no issue I still encounter the same problem.
Upvotes: 0
Views: 44