Richard
Richard

Reputation: 8935

Java OpenNLP extract all nouns from a sentence

I am using Java8 and OpenNLP. I am trying to extract all noun words from sentences.

I have tried this example, but it extracts all noun phrases ("NP"). Does anyone know how I can just extract the individual noun words?

Thanks

Upvotes: 0

Views: 1849

Answers (1)

Igor
Igor

Reputation: 1281

What have you tried so far? I haven't looked at the example you link to in a lot of detail, but I'm pretty sure that you could get where you want to with modifying that example. In any case, it's not very difficult:

InputStream modelIn = null;
POSModel POSModel = null;
try{
    File f = new File("<location to your tagger model here>");
    modelIn = new FileInputStream(f);
    POSModel = new POSModel(modelIn);
    POSTaggerME tagger = new POSTaggerME(POSModel);
    SimpleTokenizer tokenizer= new SimpleTokenizer();
    String tokens[] = tokenizer.tokenize("This is a sample sentence.");
    String[] tagged = tagger.tag(tokens);
    for (int i = 0; i < tagged.length; i++){
        if (tagged[i].equalsIgnoreCase("nn")){
            System.out.println(tokens[i]);
        }
    }

}
catch(IOException e){
    throw new BadRequestException(e.getMessage());
}

You can download the tagger models here: http://opennlp.sourceforge.net/models-1.5/

And I should say that the SimpleTokenizer is deprecated. You may want to look into a bit more sophisticated one, but in my experience, the more fancy ones from OpenNLP are also a lot slower (and in general unacceptably slow for just tokenisation).

Upvotes: 1

Related Questions