BlueGirl
BlueGirl

Reputation: 501

extracting special nodes from dependency parser

I want to find some nodes in the Stanford dependency parser, for example:

Sentence: Microsoft ad says that Macs are too cool for its customers.

Dependencies:

 - compound(ad-2, Microsoft-1)
 - nsubj(says-3, ad-2)
 - root(ROOT-0, says-3)
 - mark(cool-8, that-4)
 - nsubj(cool-8, Macs-5)
 - cop(cool-8, are-6)
 - advmod(cool-8, too-7)
 - ccomp(says-3, cool-8)
 - case(customers-11, for-9)
 - nmod:poss(customers-11, its-10)
 - nmod:for(cool-8, customers-11)

I'd like to capture the following constructs:

p1={Node with two outgoing edges with labels "nsubj" and "ccomp"},

In its dependency tree, `says` satisfies this condition, so p1={says}

and

s1={ n1={Node that connected to the p1 by an edge with label "nsubj"},
Node connected to n1 by an edge with label "nn" or "quantmod"} 

In its dependency tree s1={n1=ad, Microsoft}

I don't know how can I extract these nodes, I tried this structure for extracting ad, but it extracts Macs too!. I have no idea for extracting other nodes! Any help would be greatly appreciated.

typedDependency.reln().getShortName().equals("nsubj")

Here is my code:

Tree tree = sentence.get(TreeAnnotation.class);
        // Get dependency tree
        TreebankLanguagePack tlp = new PennTreebankLanguagePack();
        GrammaticalStructureFactory gsf = tlp.grammaticalStructureFactory();
        GrammaticalStructure gs = gsf.newGrammaticalStructure(tree);
        Collection<TypedDependency> td = gs.typedDependenciesCollapsed();
        System.out.println(td);

        Object[] list = td.toArray();
        System.out.println(list.length);
        TypedDependency typedDependency;
        for (Object object : list) {
        typedDependency = (TypedDependency) object;
        System.out.println("Depdency Name  "+typedDependency.dep().toString()+ " :: "+ "Node  "+typedDependency.reln());



        if (typedDependency.reln().getShortName().equals("nsubj")) {

                ????

}
         }
        }
    }
    }

Upvotes: 2

Views: 279

Answers (2)

vpekar
vpekar

Reputation: 3355

Each typed dependency connects a dependent and a head. For the first construct, you need to iterate over the typed dependencies and record those that have labels "nsubj" and "ccomp" as well as the ids of their heads. The id of the head of a typed dependency is accessed as follows:

typedDependency.dep().index()

Then just check which pairs of nsubj and ccomp point to the same head. In your example, one head will correspond to "say".

For the second construct, you can also use the ids of the heads in typed dependencies to track the connections.

Upvotes: 0

StanfordNLPHelp
StanfordNLPHelp

Reputation: 8739

Have you reviewed the slides on Semgrex?

They are available here:

http://nlp.stanford.edu/software/Semgrex.ppt

Some more info on Semgrex:

http://nlp.stanford.edu/software/tregex.shtml

Upvotes: 1

Related Questions