Reputation: 97
I'm a beginner to Natural Language Processing and I'm reading about POS tagging and constituents. I came across conditions where the constituent of a sentence is right but the POS tagging is wrong.
I used the Stanford Parser. http://nlp.stanford.edu:8080/parser/index.jsp
For example, "Madam, I'm Adam" produces Madam as Adverb which is not right but the constituent is right.
I'm looking for a sentence where the POS tagging is right but the constituent is wrong. Can the above condition be possible for any sentence ?
Upvotes: 2
Views: 1180
Reputation: 141
Since there are different ways to divide up a given string into smaller and smaller substrings, it is no surprise that a correct POS-tagging of a given string of words may be assigned an incorrect constituent structure.
The example given by @Sherlocked is a case of syntactic ambiguity where one meaning is preferable to the other(s). Another case of this kind is the noun phrase the advisory committee members in the sentence in (1).
(1) The advisory committee members were asked not to talk to the press
On the most natural interpretation of the noun phrase in question, the adjective advisory modifies the noun committee, which means that a structure where advisory and committee form a constituent is preferable to a structure where committee and members form a constituent. (An example where committee and members should form a constituent is the noun phrase the tired committee members.)
To give but one of many possible examples of the case you ask about that doesn’t involve syntactic ambiguity, consider the sentence in (2).
(2) They wanted him to read the book
According to transformational grammar theories, him in (2) should be part of the embedded infinitival clause and not a direct object of the verb wanted; one reason for assuming this is that him in (2) receives a thematic role from the verb read. It is conceivable that a parser would assign the correct POSs to the words in (2) but would assign (2) an incorrect structure where him is the direct object of wanted. A factor increasing the likelihood of such a mistake is that if we replace wanted in (2) by, for example, told, the correct structure is one where him is the direct object of told, since in They told him to read the book, him received a thematic role from told.
Upvotes: 2
Reputation: 97
Yes. It is possible to have POS tags right and wrong constituent structure.
Example sentence - They played in the ground with grass turf.
The POS tagging and Constituent are,
(ROOT
(S
(NP (PRP They))
(VP (VBD played)
(PP (IN in)
(NP (DT the) (NN ground)))
(PP (IN with)
(NP (NN grass) (NN turf))))))
Which means, 'They played, in the ground, with (the help of/ by using) grass turf'. But the original constituent structure should be,
(ROOT
(S
(NP (PRP They))
(VP (VBD played)
(PP (IN in)
(NP (DT the) (NN ground))
(PP (IN with)
(NP (NN grass) (NN turf)))))))
Which will ideally mean, "They played in the ground (which) had grass turf".
In the first sentence constituent, the grass turf qualifies "They played" and in the second constituent, the grass turf qualifies the ground, which is the appropriate meaning semantically.
This sentence is slightly ambiguous with the absence of a comma. But it is syntactically and semantically right.
Upvotes: 4