We are Borg
We are Borg

Reputation: 5313

Java, Stanford NLP : Extract specific speech labels from parser

I recently discovered the Stanford NLP parser and it seems quite amazing. I have currently a working instance of it running in our project but facing the below mentioned 2 problems.

  1. How can I parse text and then extract only specific speech-labels from the parsed data, for example, how can I extract only NNPS and PRP from the sentence.
  2. Our platform works in both English and German, so there is always a possibility that the text is either in English or German. How can I accommodate this scenario. Thank you.

Code :

 private final String PCG_MODEL = "edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz";

    private final TokenizerFactory<CoreLabel> tokenizerFactory = PTBTokenizer.factory(new CoreLabelTokenFactory(), "invertible=true");

 public void testParser() {
  LexicalizedParser lp = LexicalizedParser.loadModel(PCG_MODEL);
        String sent="Complete Howto guide to install EC2 Linux server in Amazon Web services cloud.";
        Tree parse;
        parse = lp.parse(sent);

        List taggedWords = parse.taggedYield();
        System.out.println(taggedWords);
}

The above example works, but as you can see I am loading the English data. Thank you.

Upvotes: 2

Views: 428

Answers (2)

user7671917
user7671917

Reputation: 11

Try this:

    for (Tree subTree: parse) // traversing the sentence's parse tree 
    {
      if(subTree.label().value().equals("NNPS")) //If the word's label is NNPS
       { //Do what you want }
    }

Upvotes: 1

Srikanth Balaji
Srikanth Balaji

Reputation: 2718

For Query 1, I don't think stanford-nlp has an option to extract a specific POS tags.

However, Using custom trained models, we can achieve the same. I had tried similar requirement for NER - name Entity recognition custom models.

Upvotes: 0

Related Questions