Reputation: 67
I've been trying to use:
$string="The Dr. is here!!! I am glad I'm in the U.S.A. for the Dr. quality is great!!!!!!";
preg_match_all('~.*?[?.!]~s',$string,$sentences);
print_r($sentences);
But it doesn't work on Dr., U.S.A., etc.
Does anyone have any better suggestions?
Upvotes: 2
Views: 5683
Reputation: 2625
there is not any simple solution for that. you need do some natural language processing(NLP) in your application and recognize each sentence. there is something call OpenNLP, it's a JAVA-based NLP parser tool. Or Stanford NLP parser in Ruby. you can find something like that for php.
here I found a set of classes for natural language processing in PHP.
Upvotes: 11
Reputation: 5335
This is almost impossible since your example clearly indicates that punctuation characters that can be used in e.g. Dr., U.S.A etc, make it impossible to know where a sentence starts/ends.
You have to search the following characters to decide if a new sentence follows (starts after) the punctuation chars you are mentioning.
Upvotes: 0
Reputation: 60413
hmmm maybe try something like $sentences = preg_split('/.*?[?.!]+\s+/', $string);
Upvotes: 1