MichaDe
MichaDe

Reputation: 41

TreeTagger can't find Charsetname when used in Uima Pipeline

I would like to use the TreeTagger for chunking inside an uima pipeline for a German text. The chunking works fine when I start the Tagger with cmd, but causes the following error when used in the pipeline:

    org.apache.uima.analysis_engine.AnalysisEngineProcessException: Annotator processing failed.    
    at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:401)
    at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.processAndOutputNewCASes(PrimitiveAnalysisEngine_impl.java:308)
    at org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.processUntilNextOutputCas(ASB_impl.java:570)
    at org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.<init>(ASB_impl.java:412)
    at org.apache.uima.analysis_engine.asb.impl.ASB_impl.process(ASB_impl.java:344)
    at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.processAndOutputNewCASes(AggregateAnalysisEngine_impl.java:265)
    at org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:269)
    at org.apache.uima.fit.pipeline.SimplePipeline.runPipeline(SimplePipeline.java:150)
    at de.fraunhofer.fkie.re_analysis.RA_pipeline.main(RA_pipeline.java:107)
Caused by: java.lang.NullPointerException: charsetName
    at java.io.InputStreamReader.<init>(InputStreamReader.java:99)
    at org.annolab.tt4j.TreeTaggerWrapper$Reader.<init>(TreeTaggerWrapper.java:946)
    at org.annolab.tt4j.TreeTaggerWrapper.process(TreeTaggerWrapper.java:598)
    at de.tudarmstadt.ukp.dkpro.core.treetagger.TreeTaggerChunker.process(TreeTaggerChunker.java:293)
    at org.apache.uima.analysis_component.JCasAnnotator_ImplBase.process(JCasAnnotator_ImplBase.java:48)
    at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:385)
    ... 8 more

I suppose I should specify the parameter "Chunk_Mapping_Location", but I don't know to which file. The chunker is initialised in the following way:

                AnalysisEngineDescription chunker =
                    AnalysisEngineFactory.createEngineDescription(
                                TreeTaggerChunker.class,
                                TreeTaggerChunker.PARAM_EXECUTABLE_PATH, "C:/TreeTagger/bin/tree-tagger.exe",
                                TreeTaggerChunker.PARAM_MODEL_LOCATION, "C:/TreeTagger/lib/german-chunker-utf8.par",
                                TreeTaggerChunker.PARAM_PERFORMANCE_MODE, true,
                                TreeTaggerChunker.PARAM_PRINT_TAGSET, true,
                                TreeTaggerChunker.PARAM_LANGUAGE, "de"
                            );

Upvotes: 0

Views: 67

Answers (1)

rec
rec

Reputation: 10895

Looks like TreeTaggerChunking is missing PARAM_MODEL_ENCODING which prevents it being usable with directly specified models. I have opened an issue.

You can get around this by packaging the TreeTagger models as JARs using the build.xml Ant script included with DKPro Core. The process is described in the DKPro Core developer documentation.

Disclosure: I am one of the DKPro Core developers.

Upvotes: 0

Related Questions