NinjaR
NinjaR

Reputation: 623

Generate three level dependency in case a verb is attached with non verb in dependency parsing

I am using dependency parsing for a use case in R with the corenlp package. However, I need to tweak the dataframe for a specific use case.

I need a dataframe where I have three columns. I have used the below code to reach till the dependency tree.

devtools::install_github("statsmaths/coreNLP")
coreNLP::downloadCoreNLP()
initCoreNLP()
inp_cl = "generate odd numbers from column one and print."
output = annotateString(inp_cl)
dc = getDependency(output)

 sentence governor dependent      type governorIdx dependentIdx govIndex depIndex
1        1     ROOT  generate      root           0            1       NA        1
2        1  numbers       odd      amod           3            2        3        2
3        1 generate   numbers      dobj           1            3        1        3
4        1   column      from      case           5            4        5        4
5        1 generate    column nmod:from           1            5        1        5
6        1   column       one    nummod           5            6        5        6
7        1   column       and        cc           5            7        5        7
8        1 generate     print nmod:from           1            8        1        8
9        1   column     print  conj:and           5            8        5        8
10       1 generate         .     punct           1            7        1        10

Using POS tagging with the following code, I ended up with the following data frame.

ps = getToken(output)

ps = ps[,c(1,2,7,3)]

colnames(dc)[8] = "id"

dp = merge(dc, ps[,c("sentence","id","POS")], 
     by.x=c("sentence","governorIdx"),by.y = c("sentence","id"),all.x = T)

dp = merge(dp, ps[,c("sentence","id","POS")], 
     by.x=c("sentence","dependentIdx"),by.y = c("sentence","id"),all.x = T)

colnames(dp)[9:10] = c("POS_gov","POS_dep")


  sentence dependentIdx governorIdx governor dependent      type govIndex id POS_gov POS_dep
1         1            1           0     ROOT  generate      root       NA  1    <NA>      VB
2         1            2           3  numbers       odd      amod        3  2     NNS      JJ
3         1            3           1 generate   numbers      dobj        1  3      VB     NNS
4         1            4           5   column      from      case        5  4      NN      IN
5         1            5           1 generate    column nmod:from        1  5      VB      NN
6         1            6           5   column       one    nummod        5  6      NN      CD
7         1            7           5   column       and        cc        5  7      NN      CC
8         1            8           1 generate     print nmod:from        1  8      VB      NN
9         1            8           5   column     print  conj:and        5  8      NN      NN
10        1            9           1 generate         .     punct        1  9      VB       .

In case a verb(action word) is attached to a non-verb(non action word), but the non-verb(non-action word) is connected to other non-verb(non-action words) then one row should indicate the entire connection. Eg: generate is a verb connected to numbers and numbers is a non verb connected to odd.

So the intended data frame needs to be

Topic1 Topic2 Action
numbers odd    generate
column  from   generate
column  one    generate
column  and    generate
column  from   print
column  one    print
column  and    print
         .     generate

Upvotes: 1

Views: 99

Answers (1)

John R
John R

Reputation: 1508

First you'll need to have your dependency tree tag print as a verb, rather than a noun.

Try using a sentence with two independent clauses, and see if the root of the second independent clause is tagged as such.

If so, it's a simple walk through the governoridx column. If not, you'll need to address the mechanics of your dependency tree generator.

Upvotes: 1

Related Questions