Reputation: 623
I am using dependency parsing for a use case in R with the corenlp package. However, I need to tweak the dataframe for a specific use case.
I need a dataframe where I have three columns. I have used the below code to reach till the dependency tree.
devtools::install_github("statsmaths/coreNLP")
coreNLP::downloadCoreNLP()
initCoreNLP()
inp_cl = "generate odd numbers from column one and print."
output = annotateString(inp_cl)
dc = getDependency(output)
sentence governor dependent type governorIdx dependentIdx govIndex depIndex
1 1 ROOT generate root 0 1 NA 1
2 1 numbers odd amod 3 2 3 2
3 1 generate numbers dobj 1 3 1 3
4 1 column from case 5 4 5 4
5 1 generate column nmod:from 1 5 1 5
6 1 column one nummod 5 6 5 6
7 1 column and cc 5 7 5 7
8 1 generate print nmod:from 1 8 1 8
9 1 column print conj:and 5 8 5 8
10 1 generate . punct 1 7 1 10
Using POS tagging with the following code, I ended up with the following data frame.
ps = getToken(output)
ps = ps[,c(1,2,7,3)]
colnames(dc)[8] = "id"
dp = merge(dc, ps[,c("sentence","id","POS")],
by.x=c("sentence","governorIdx"),by.y = c("sentence","id"),all.x = T)
dp = merge(dp, ps[,c("sentence","id","POS")],
by.x=c("sentence","dependentIdx"),by.y = c("sentence","id"),all.x = T)
colnames(dp)[9:10] = c("POS_gov","POS_dep")
sentence dependentIdx governorIdx governor dependent type govIndex id POS_gov POS_dep
1 1 1 0 ROOT generate root NA 1 <NA> VB
2 1 2 3 numbers odd amod 3 2 NNS JJ
3 1 3 1 generate numbers dobj 1 3 VB NNS
4 1 4 5 column from case 5 4 NN IN
5 1 5 1 generate column nmod:from 1 5 VB NN
6 1 6 5 column one nummod 5 6 NN CD
7 1 7 5 column and cc 5 7 NN CC
8 1 8 1 generate print nmod:from 1 8 VB NN
9 1 8 5 column print conj:and 5 8 NN NN
10 1 9 1 generate . punct 1 9 VB .
In case a verb(action word) is attached to a non-verb(non action word), but the non-verb(non-action word) is connected to other non-verb(non-action words) then one row should indicate the entire connection. Eg: generate is a verb connected to numbers and numbers is a non verb connected to odd.
So the intended data frame needs to be
Topic1 Topic2 Action
numbers odd generate
column from generate
column one generate
column and generate
column from print
column one print
column and print
. generate
Upvotes: 1
Views: 99
Reputation: 1508
First you'll need to have your dependency tree tag print as a verb, rather than a noun.
Try using a sentence with two independent clauses, and see if the root of the second independent clause is tagged as such.
If so, it's a simple walk through the governoridx column. If not, you'll need to address the mechanics of your dependency tree generator.
Upvotes: 1