Kasun
Kasun

Reputation: 336

Stanford parser- tagging with financial instruments

I have set of financial documents (Fixed terms deposit documents, Credit card documents). I want to automatically identify and tag financial entities/instruments in those documents.

For example if the document contains this phrase “reserves the right to repay with interest without notice”. I want to identify financial term related to it, and tag with it, for this sentence it is “Callable”. For this phrase “permit premature withdrawal” the related financial term is “Putable”, so if this phrase is in the documents I want to tag it with term “Putable”.

The financial terms will come from, Financial Industry Business Ontology. Is there any possibility of using Stanford parser for this purpose? Can I use POS tagger for this purpose? I may have to train the Stanford parser with financial instruments, If it is possible how can I train the Stanford parser to identify financial instruments?

Upvotes: 1

Views: 893

Answers (3)

Vitaly Olegovitch
Vitaly Olegovitch

Reputation: 3547

POS tagging will trasnform your text files into XML files. An easy way to achieve POS tagging and named entity recognition is:

import java.io.IOException;
import edu.stanford.nlp.pipeline.StanfordCoreNLP;

public class POSTagging{
  public static void main(String[] args) {
    String arguments= "-annotators tokenize,ssplit,pos,lemma,ner -filelist ./filelist/filelist.txt -outputDirectory ./annotated";
    String[] commArgs=arguments.split(" ");
    try {
      StanfordCoreNLP.main(commArgs);
    } catch (IOException e) {
      e.printStackTrace();
    } catch (ClassNotFoundException e) {
      e.printStackTrace();
    }
  }
}

Once you have run this you will have your annotated XML files. You will have to parse them using JAXP or something equivalent.

Upvotes: 1

Christopher Manning
Christopher Manning

Reputation: 9450

A parser or part of speech tagger out of the box will not identify domain specific concepts such as these. However, the natural language analysis they provide may be useful building blocks for a solution. Or if the phrases you need to identify are near enough to fixed phrases, they may be unnecessary and you should concentrate on finding the fixed phrases and classifying them.

While these are not "named entities", the problem is closer to named entity recognition, in that you are recognizing semantic phrase classes. You could either annotate examples of the phrases you wish to find and train a model with a named entity recognizer (e.g., Stanford NER) or write rules that match instances (using something like ANNIE in GATE or Stanford's TokensRegexPattern.

Upvotes: 7

Bhavik Ambani
Bhavik Ambani

Reputation: 6657

You have to parse the whole sentence from which you have to identify the values. Then tokenize the values and identify the Noun, Verb etc.

You can take the help of the sample output displayed here. By using you can parse and identify the terms using the dictionary terms, that you will have to develop for.

You can also use the API of the same here

Hope this will help you.

Upvotes: 2

Related Questions