Reputation: 395
I'm working in a machine translation project. I need to identify subject,verb,object of a sentence in order continue my work. currently I'm using Stanford NLP parser to analyze the sentence. But I don't know how to extract SVO. Any ideas that I can consider?
Upvotes: 2
Views: 3716
Reputation: 1787
I think it would be pretty tough if you want to implement full analysis of English sentences. First, you would need a dictionary which gives all possible parts of speech of a word. Then you would build the structure of the sentence according to several rules.
Some of the most basic rules are like this:
NP(Noun Phrase): N(Noun),
Pronoun,
[any number of ADJP(Adjective Phrase)] + N,
NP + [any number of ADJP],
NP + CONJ + NP
ADJP(Adjective Phrase): ADJ(Adjective),
[any number of ADVP(Adverb Phrase)] + ADJP,
PREP(Preposition) + NP
ADVP(Adverb Phrase): ADV(Adverb),
ADV + ADVP
VP(Verb Phrase): Vi(Intransitive Verb),
Vt(Transitive Verb) + NP,
VP + [any number of ADVP],
VP + CONJ + VP,
[any number of ADVP] + VP
S(Sentence): NP(Noun Phrase) + VP(Verb Phrase),
NP + AUX_V(Auxiliary Verb) + VP,
VP(Verb Phrase) (<=imperative sentence),
S + CONJ + S
Using these rules, pretty many sentences can be analyzed, including
My dog runs very fast.
=> ADJ N Vi ADV ADV
=> (ADJ N) (Vi) (ADV ADVP)
=> (NP) (VP ADVP )
=> (NP VP)
=> (S)
and
I do not really like snacks like chips or candy.
=> N AUX_V ADV ADV Vt N PREP N CONJ N
=>(NP)(AUX_V)(ADV)(ADV) (Vt) (NP) (PREP (NP CONJ NP))
=>(NP)(AUX_V)(ADV)(ADV) (Vt) (NP) (PREP NP)
=>(NP)(AUX_V)(ADV)(ADV) (Vt) (NP ADJP)
=>(NP)(AUX_V)(ADV)(ADV) (Vt NP)
=>(NP)(AUX_V)(ADVP ADVP VP)
=>(NP AUX_V VP)
=> (S)
but it still can't analyze complex sentences like:
He is the one who won the Nobel Prize in 2014.
or
It is computers that brought the biggest change to our lives in history.
You would need to add rules on clauses and conjunctions(like "while", "when" and "if"). You would also need to add rules on infinitives and gerunds. You would still need to add rules on verbs that take two objects(like "give" and "tell"), and verbs that take state of an object(like "look", "seem" and "get", also "make" in "I made you angry.") ......
After you added all the rules present in English, there comes more complex things to deal with. For example,
They are hunting dogs.
(This sentence has two possible structures, thus producing two different meanings; One is that "hunting dogs" is an ADJP, and the other is that it is a NP.)
or
She told me that she loved me, which was a lie.
(In this case, the "which was a lie" clause(an ADJP) describes the NP "that she loved me", but it's theoretically possible that the clause describes "me"(a pronoun is also a NP) or the whole part "She told me that she loved me". The program would have to be able to somehow figure out that the most likely one is the first one.)
So I think what I would do would be building something like a graph representing possible structures according to the dictionary and the rules, while processing the sentence word-wise, and then applying taboo search to reduce the possibilities, and finally I would have to use statistical way or otherwise somehow make the computer understand the real-world situation in order to choose the most likely structure out of several possibilities left.
By the way, the Stanford parser gave a wrong structure when I entered the last example. It gave "She told me that she loved [me, which was a lie]". Like this, it is not easy to make this kind of thing work perfectly.
Upvotes: 3
Reputation: 83
It seems straightforward to me: the subject is co-deep in the parse with the verb, and the object is typically the first NP in the verb phrase. That said, determining the syntactical status of elements isn't necessarily a simple task to cover all edge cases.
Upvotes: 0