duichwer
duichwer

Reputation: 177

Is it possible to get better results for parsing imperative sentences with StanfordNLP?

I want to find patterns in sentence structure. Therefore I'm trying to get the parse tree as preprocessing.

Until now I used the Stanford CoreNLPParser. Many of my sentences are imperative sentences. After receiving much more clusters as I expected, I reviewed the parse tree and found out that often verbs at the beginning of my imperative sentences were parsed as Noun Phrases (NP).

I found the following answer: https://stackoverflow.com/a/35887762/6068675

Since this answer is from 2016 I was hoping there might be another option to get better results. Only lowercase every first word in a sentence doesn't look like an ideal solution.

I include a few examples that got parsed wrong:

(ROOT (S (S (NP (NNP View)) (NP (NP (DT a) (NN list)) (PP (IN of) (NP (JJ ongoing) (NNS sales) (NNS quotes))) (PP (IN for) (NP (DT the) (NN customer))))) (. .)))

(ROOT (NP (NP (NN Request) (NN approval) (S (VP (TO to) (VP (VB change) (NP (DT the) (NN record)))))) (. .)))

Further Examples

(ROOT (NP (NP (NNP View)) (CC or) (VP (VB change) (NP (NP (JJ detailed) (NN information)) (PP (IN about) (NP (DT the) (NN customer))))) (. .)))
(ROOT (FRAG (PP (IN Post) (NP (DT the) (VBN specified) (NN prepayment) (NN information))) (. .)))
(ROOT (S (S (NP (NNP View)) (NP (NP (DT a) (NN summary)) (PP (IN of) (NP (DT the) (NN debit) (CC and) (NN credit) (NNS balances))) (PP (IN for) (NP (JJ different) (NN time) (NNS periods))))) (. .)))
(ROOT (NP (NP (NP (NN Offer) (NNS items)) (CC or) (NP (NP (NNS services)) (PP (TO to) (NP (DT a) (NN customer))))) (. .)))
(ROOT (NP (NP (NP (NNP View)) (CC or) (VP (VB add) (NP (NP (NNS comments)) (PP (IN for) (NP (DT the) (NN record)))))) (. .)))

Upvotes: 1

Views: 916

Answers (2)

Dhruv Jimulia
Dhruv Jimulia

Reputation: 641

I looked at your examples, and it seems that the parser is classifying the first word in your imperatives as noun (NN) or proper noun (NNP). I think that is because a capital letter is present in the beginning of the first word (and proper nouns start with a capital letter). Essentially, the parser hasn't "learned" that every sentence in English starts with a capital letter, and that the capital letter in the beginning provides no information about the part-of-speech.

Of course, the long-term solution to this is that the parser should be trained on other texts with imperatives, but there is also a short-run hack to this problem: since the problem lies with capital letters, we can use the .lower() String function in Python (or any other corresponding function in other programming languages) to preprocess the string to lowercase before parsing it.

I myself tried this with your examples and now the parser is correctly classifying the beginning of your sentences as Verb Phrases.

Note:: The above hack may lead to incorrect classification of Proper Nouns because they will no longer have a capital letter. As long as you are not using the parser to detect Proper Nouns alongside imperatives, you should be okay

Upvotes: 1

StanfordNLPHelp
StanfordNLPHelp

Reputation: 8739

Unfortunately the part-of-speech tagger is trained on the Wall Street Journal from years ago. So there are issues where imperative statements aren't in the training data. So it's going to guess wrong at times. But on some imperative statements it does the right thing as well. I think if the first word is a clear verb like "Call" you will get better performance.

Another issue I saw is the verb "text" (as in send a text message) is not being handled well.

I think we would be excited to add some contemporary data and add some imperative training data to help out.

Upvotes: 0

Related Questions