Reputation: 314
I want to make a regex which can to match a sentence and word of matches sentence. If '!', '?' , '.' is matched then it treats as end of the sentence and it also matches each and every words of a matched sentence.
My regex to match sentence: [^?!.]+
My regex to match each and every word separately: [^\s]+
But, I can't to join this two regex to do that.
...Tested string...
I am Raktim Banerjee. I love to code.
should return
2 sentence 8 words
And
Stackoverflow is the best coding forum. I love stackoverflow!
should return
2 sentence 9 words.
Thanks in advance for your helping hand.
Upvotes: 0
Views: 858
Reputation: 44053
I believe you said you wanted this in JavaScript:
var s = 'I am Raktim Banerjee. I love to code.'
var regex = /\b([^!?. ]+)(?:(?: +)([^!?. ]+))*\b([!?.])/g
var m, numSentences = 0, numWords = 0;
do {
m = regex.exec(s);
if (m) {
numSentences++;
numWords += m[0].split(' ').length
}
} while (m);
console.log(numSentences + ' sentences, ' + numWords + ' words')
Here is a second iteration. I modified the regex to recognize a few salutations, Mr., Mrs. and Dr. (you can add additional ones), and to add a primitive sub regular expression to recognize an email address. And I also simplified the original regex a bit. I hope this helps (no guarantees because the email check is overly simplified):
var s = 'Mr. Raktim Banerjee. My email address is [email protected].'
var regex = /\b((Mrs?\.|Dr\.|\S+@\S+|[^!?. ]+)\s*)+([!?.])/g
var m, numSentences = 0, numWords = 0;
do {
m = regex.exec(s);
if (m) {
numSentences++;
numWords += m[0].split(' ').length
}
} while (m);
console.log(numSentences + ' sentences, ' + numWords + ' words')
Upvotes: 1
Reputation: 104
Are you looking for something like this :
import re
s1="I am Raktim Banerjee. I love to code. "
s2="Stackoverflow is the best coding forum. I love stackoverflow! "
print(len(re.compile("[^?!.]+").findall(s1))-1,"sentence",len(re.compile("[^\s]+").findall(s1)),"words")
print(len(re.compile("[^?!.]+").findall(s2))-1,"sentence",len(re.compile("[^\s]+").findall(s2)),"words")
Running above outputs:
2 sentence 8 words
2 sentence 9 words
Upvotes: 1