Reputation: 1614
If a single query from the user contains multiple questions belonging to different categories, how can they be identified, split and parsed?
Eg -
User - what is the weather now and tell me my next meeting
Parser - {:weather => "what is the weather", :schedule => "tell me my next meeting"}
Parser identifies the parts of sentences where the question belongs to two different categories
User - show me hotels in san francisco for tomorrow that are less than $300 but not less than $200 are pet friendly have a gym and a pool with 3 or 4 stars staying for 2 nights and dont include anything that doesnt have wifi
Parser - {:hotels => ["show me hotels in san francisco",
"for tomorrow", "less than $300 but not less than $200",
"pet friendly have a gym and a pool",
"with 3 or 4 stars", "staying for 2 nights", "with wifi"]}
Parser identifies the question belonging to only one category but has additional steps for fine tuning the answer and created an array ordered according to the steps to take
From what I can understand this requires a sentence segmenter, multi-label classifier and co-reference resolution
But the sentence segementer I have come across depend heavily on grammar, punctuations.
Multi-label classifiers, like a good trained naive bayes classifier works in most cases but since they are multi-label, most times output multiple categories for sentences which clearly belong to one class. Depending solely on the array outputs to check the labels present would fail.
If used a multi-class classifier, that is also good to check the array output of probable categories but obviously they dont tell the different parts of the sentence much accurately, much less in what fashion to proceed with the next step.
As a first step, how can I tune sentence segmenter to correctly split the sentence without any strict grammar rules. Good accuracy of this would help a lot in classification.
Upvotes: 1
Views: 901
Reputation: 6039
As a first step, how can I tune sentence segmenter to correctly split the sentence without any strict grammar rules.
Instead of doing this I'd suggest you use the parse-tree directly (either dependency parser, or constituency parse).
Here I'm showing the output of the dependency parse and you can see that the two segments are separated via a "CONJ" arrow:
(from here: http://deagol.cs.illinois.edu:8080/)
Another solution I'd give try is ClausIE: https://gate.d5.mpi-inf.mpg.de/ClausIEGate/ClausIEGate?inputtext=what+is+the+weather+now+and+tell+me+my+next+meeting++&processCcAllVerbs=true&processCcNonVerbs=true&type=true&go=Extract
Upvotes: 2
Reputation: 2378
If you want something for segmentation that doesn't depend on grammar heavily, then chunking comes to mind. In the NLTK book there is a fragment on that. The approach authors take here depends only on part of speech tags.
BTW Jurafsky and Martin's 3rd ed of Speech and Language processing contains information on chunking in the parsing chapter, and it also contains a chapters on information retrieval nad chatbots.
Upvotes: 2