Stan Murdoch
Stan Murdoch

Reputation: 31

Build a Part-of-Speech Tagger (POS Tagger)

I need to build a POS tagger in Java and need to know how to get started. Are there code examples or other resources that help illustrate how POS taggers work?

Upvotes: 3

Views: 8285

Answers (3)

user439521
user439521

Reputation: 670

There are a few POS/NER taggers used widely.

OpenNLP Maxent POS taggers: Using Apache OpenNLP.

Open NLP is a powerful java NLP library from Apache. It provides various tools for NLP one of which is Parts-Of-Speech (POS) tagger. Usually POS taggers are used to find out structure grammatical structure in text, you use a tagged dataset where each word (part of a phrase) is tagged with a label, you build an NLP model from this dataset and then for a new text you can use the model to generate tags for each word in the text.

Sample code:

public void doTagging(POSModel model, String input) {
    input = input.trim();
    POSTaggerME tagger = new POSTaggerME(model);
    Sequence[] sequences = tagger.topKSequences(input.split(" "));
    for (Sequence s : sequences) {
        List<String> tags = s.getOutcomes();
        System.out.println(Arrays.asList(input.split(" ")) +" =>" + tags);
    }
}

Detailed blog with the full code on how to use it:

https://dataturks.com/blog/opennlp-pos-tagger-training-java-example.php?s=so

Stanford CoreNLP based NER tagger:

Stanford core NLP is by far the most battle-tested NLP library out there. In a way, it is the golden standard of NLP performance today. Among various other functionalities, named entity recognization (NER) is supported in the library, what this allows is to tag important entities in a piece of text like the name of a person, place etc.

Sample code:

public void doTagging(CRFClassifier model, String input) {
  input = input.trim();
  System.out.println(input + "=>"  +  model.classifyToString(input));
}  

Detailed blog with the full code on how to use it:

https://dataturks.com/blog/stanford-core-nlp-ner-training-java-example.php?s=so

Upvotes: 2

wcolen
wcolen

Reputation: 1431

Try Apache OpenNLP. It includes a POS Tagger tools. You can download ready-to-use English models from here.

The documentation provides details about how to use it from a Java application. Basically you need the following:

Load the POS model

InputStream modelIn = null;

try {
  modelIn = new FileInputStream("en-pos-maxent.bin");
  POSModel model = new POSModel(modelIn);
}
catch (IOException e) {
  // Model loading failed, handle the error
  e.printStackTrace();
}
finally {
  if (modelIn != null) {
    try {
      modelIn.close();
    }
    catch (IOException e) {
    }
  }
}

Instantiate the POS tagger

POSTaggerME tagger = new POSTaggerME(model);

Execute it

String sent[] = new String[]{"Most", "large", "cities", "in", "the", "US", "had", "morning", "and", "afternoon", "newspapers", "."};          
String tags[] = tagger.tag(sent);

Note that the POS tagger expects a tokenized sentence. Apache OpenNLP also provides tools and models to help with these tasks.

If you have to train your own model refer to this documentation.

Upvotes: 6

Andrey
Andrey

Reputation: 6766

You can examine existing taggers implementations.

Refer for example to Stanford University POS tagger in Java (by Kristina Toutanova), it is available under GNU General Public License (v2 or later), source code is well written and clearly documented:

http://nlp.stanford.edu/software/tagger.shtml

Good book to read about tagging is: Speech and Language Processing (2nd Edition) by Daniel Jurafsky, James H. Martin

Upvotes: 5

Related Questions