Reputation: 41
I would like to use the TreeTagger for chunking inside an uima pipeline for a German text. The chunking works fine when I start the Tagger with cmd, but causes the following error when used in the pipeline:
org.apache.uima.analysis_engine.AnalysisEngineProcessException: Annotator processing failed.
at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:401)
at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.processAndOutputNewCASes(PrimitiveAnalysisEngine_impl.java:308)
at org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.processUntilNextOutputCas(ASB_impl.java:570)
at org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.<init>(ASB_impl.java:412)
at org.apache.uima.analysis_engine.asb.impl.ASB_impl.process(ASB_impl.java:344)
at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.processAndOutputNewCASes(AggregateAnalysisEngine_impl.java:265)
at org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:269)
at org.apache.uima.fit.pipeline.SimplePipeline.runPipeline(SimplePipeline.java:150)
at de.fraunhofer.fkie.re_analysis.RA_pipeline.main(RA_pipeline.java:107)
Caused by: java.lang.NullPointerException: charsetName
at java.io.InputStreamReader.<init>(InputStreamReader.java:99)
at org.annolab.tt4j.TreeTaggerWrapper$Reader.<init>(TreeTaggerWrapper.java:946)
at org.annolab.tt4j.TreeTaggerWrapper.process(TreeTaggerWrapper.java:598)
at de.tudarmstadt.ukp.dkpro.core.treetagger.TreeTaggerChunker.process(TreeTaggerChunker.java:293)
at org.apache.uima.analysis_component.JCasAnnotator_ImplBase.process(JCasAnnotator_ImplBase.java:48)
at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:385)
... 8 more
I suppose I should specify the parameter "Chunk_Mapping_Location", but I don't know to which file. The chunker is initialised in the following way:
AnalysisEngineDescription chunker =
AnalysisEngineFactory.createEngineDescription(
TreeTaggerChunker.class,
TreeTaggerChunker.PARAM_EXECUTABLE_PATH, "C:/TreeTagger/bin/tree-tagger.exe",
TreeTaggerChunker.PARAM_MODEL_LOCATION, "C:/TreeTagger/lib/german-chunker-utf8.par",
TreeTaggerChunker.PARAM_PERFORMANCE_MODE, true,
TreeTaggerChunker.PARAM_PRINT_TAGSET, true,
TreeTaggerChunker.PARAM_LANGUAGE, "de"
);
Upvotes: 0
Views: 67
Reputation: 10895
Looks like TreeTaggerChunking is missing PARAM_MODEL_ENCODING
which prevents it being usable with directly specified models. I have opened an issue.
You can get around this by packaging the TreeTagger models as JARs using the build.xml
Ant script included with DKPro Core. The process is described in the DKPro Core developer documentation.
Disclosure: I am one of the DKPro Core developers.
Upvotes: 0