How can I get a GrammaticalStructure object for a German sentence using the Stanford Parser?

Question

I am using the Stanford Parser (Version 3.5.2) for an NLP application that relies on the analysis of dependency parses as well as information from other sources. So far, I've gotten it to work for English, like so:

import java.io.StringReader;
import java.util.ArrayList;
import java.util.LinkedList;
import java.util.List;

import edu.stanford.nlp.ling.HasWord;
import edu.stanford.nlp.ling.TaggedWord;
import edu.stanford.nlp.process.Tokenizer;
import edu.stanford.nlp.parser.lexparser.LexicalizedParser;
import edu.stanford.nlp.trees.GrammaticalStructure;
import edu.stanford.nlp.trees.GrammaticalStructureFactory;
import edu.stanford.nlp.trees.TreebankLanguagePack;
import edu.stanford.nlp.trees.TypedDependency;


/**
* Stanford Parser Wrapper (for Stanford Parser Version 3.5.2).
* 
*/

public class StanfordParserWrapper {

public static void parse(String en, String align, String out) {

// setup stanfordparser
String grammar = "edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz";
String[] options = { "-outputFormat", "wordsAndTags, typedDependencies" };
LexicalizedParser lp = LexicalizedParser.loadModel(grammar, options);
TreebankLanguagePack tlp = lp.getOp().langpack();
GrammaticalStructureFactory gsf = tlp.grammaticalStructureFactory();

// read document
Iterable> sentences;
Reader r = new Reader(en);
String line = null;
List> tmp = new ArrayList>();
while ((line = r.getNext()) != null) {
    Tokenizer token = tlp.getTokenizerFactory()
        .getTokenizer(new StringReader(line));
    List sentence = token.tokenize();
    tmp.add(sentence);
}
sentences = tmp;

Reader alignment = new Reader(align);
Writer treeWriter = new Writer(out);

// parse
long start = System.currentTimeMillis();
// System.err.print("Parsing sentences ");
int sentID = 0;
for (List sentence : sentences) {
    Tree t = new Tree();
    t.setSentID(++sentID);
    System.out.println("parse Sentence " + t.getSentID() + " "
        + sentence + "...");
    // System.err.print(".");

    edu.stanford.nlp.trees.Tree parse = lp.parse(sentence);

    // ROOT node
    Node root = new Node(true, true);
    t.setNode(root);

    // tagging
    int counter = 0;
    for (TaggedWord tw : parse.taggedYield()) {
    Node n = new Node();
    n.setNodeID(++counter);
    n.setSurface(tw.value());
    n.setTag(tw.tag());
    t.setNode(n);
    }

    t.setSentLength(t.getNodes().size() - 1);

    // labeling
    GrammaticalStructure gs = gsf.newGrammaticalStructure(parse);
    List tdl = gs.typedDependenciesCCprocessed();
    for (TypedDependency td : tdl) {
    Node dep = t.getNodes().get(td.dep().index());
    Node gov = t.getNodes().get(td.gov().index());
    dep.setLabel(td.reln().toString());
    gov.setChild(dep);
    dep.setParent(gov);
    }

    // combine with alignment
    t.initialize(alignment.readNextAlign());
    treeWriter.write(t);
}
long stop = System.currentTimeMillis();
System.err.println("...done! [" + (stop - start) / 1000 + " sec].");

treeWriter.close();
}

public static void main(String[] args) {
if (args.length == 3) {
    parse(args[0], args[1], args[2]);
} else {
    System.out.println("Usage:");
}
}
}

"Node" and "Tree" are my own classes, not those of the Stanford parser.

My question is this: How can I do the same thing for German? When I replace the English grammar model with "edu/stanford/nlp/models/lexparser/germanPCFG.ser.gz", I get the following exception:

Exception in thread "main" java.lang.UnsupportedOperationException: No GrammaticalStructureFactory defined for edu.stanford.nlp.trees.international.negra.NegraPennLanguagePack
at edu.stanford.nlp.trees.AbstractTreebankLanguagePack.grammaticalStructureFactory(AbstractTreebankLanguagePack.java:591)
at StanfordParserWrapper.parse(StanfordParserWrapper.java:46)
at StanfordParserWrapper.main(StanfordParserWrapper.java:117)

Same thing goes for the "germanFactored" model. Obviously, I need to do something different here, as the German model doesn't support GrammaticalStructureFactory. Is there some way to still get a GrammaticalStructure from German text, or do I have to write my code for German completely differently? If so, I'd be grateful for some pointers, I've looked for this info quite a bit but couldn't find what I was looking for.

This seems relevant: How to parse languages other than English with Stanford Parser？ in java, not command lines However, it just tells me that GrammaticalStructureFactory IS supported for Chinese models, but not what I need to do for German parsing.

Thanks a lot,

J

How can I get a GrammaticalStructure object for a German sentence using the Stanford Parser?

Answers (1)

Related Questions