Paperless
Paperless

Reputation: 45

Tagging word and sentence using stanford corenlp library fails

 //tagger
      MaxentTagger tagger = new MaxentTagger(args[0]);
      TokenizerFactory<CoreLabel> ptbTokenizerFactory = PTBTokenizer.factory(new CoreLabelTokenFactory(),
                                   "untokenizable=noneKeep");
      BufferedReader r = new BufferedReader(new InputStreamReader(new FileInputStream(args[1]), "utf-8"));
      PrintWriter pw = new PrintWriter(new OutputStreamWriter(System.out, "utf-8"));
      DocumentPreprocessor documentPreprocessor = new DocumentPreprocessor(r);
      documentPreprocessor.setTokenizerFactory(ptbTokenizerFactory);
      for (List<HasWord> sentence : documentPreprocessor) {
        List<TaggedWord> tSentence = tagger.tagSentence(sentence);
        pw.println(Sentence.listToString(tSentence, false));
      }

It fails with following exception Reading POS tagger model from C:\work\development\workspace\stanfordnlp\sample.txt ...

C:\work\development\workspace\stanfordnlp\sample.txtException in thread "main" edu.stanford.nlp.io.RuntimeIOException: Error while loading a tagger model (probably missing model file)
    at edu.stanford.nlp.tagger.maxent.MaxentTagger.readModelAndInit(MaxentTagger.java:869)
    at edu.stanford.nlp.tagger.maxent.MaxentTagger.readModelAndInit(MaxentTagger.java:767)
    at edu.stanford.nlp.tagger.maxent.MaxentTagger.<init>(MaxentTagger.java:298)
    at edu.stanford.nlp.tagger.maxent.MaxentTagger.<init>(MaxentTagger.java:263)
    at phoenix.TokenizerDemo.main(TokenizerDemo.java:42)
Caused by: java.io.StreamCorruptedException: invalid stream header: 416E6F74
    at java.io.ObjectInputStream.readStreamHeader(Unknown Source)
    at java.io.ObjectInputStream.<init>(Unknown Source)
    at edu.stanford.nlp.tagger.maxent.TaggerConfig.readConfig(TaggerConfig.java:748)
    at edu.stanford.nlp.tagger.maxent.MaxentTagger.readModelAndInit(MaxentTagger.java:804)
    ... 4 more

Upvotes: 1

Views: 420

Answers (1)

Jon Gauthier
Jon Gauthier

Reputation: 25572

The log should clearly indicate the problem:

Reading POS tagger model from C:\work\development\workspace\stanfordnlp\sample.txt ...

You are incorrectly instantiating the MaxentTagger instance. If you provide a single string argument to the constructor, that string is expected to provide a path to a tagger model file.

See the documentation for MaxentTagger for more information.

Upvotes: 1

Related Questions