RAKTIM BANERJEE
RAKTIM BANERJEE

Reputation: 314

Regex to match a sentence and word of a string

I want to make a regex which can to match a sentence and word of matches sentence. If '!', '?' , '.' is matched then it treats as end of the sentence and it also matches each and every words of a matched sentence.

My regex to match sentence: [^?!.]+

My regex to match each and every word separately: [^\s]+

But, I can't to join this two regex to do that.

...Tested string...

I am Raktim Banerjee. I love to code.

should return

2 sentence 8 words

And

 Stackoverflow is the best coding forum. I love stackoverflow!

should return

2 sentence 9 words.

Thanks in advance for your helping hand.

Upvotes: 0

Views: 858

Answers (2)

Booboo
Booboo

Reputation: 44053

I believe you said you wanted this in JavaScript:

var s = 'I am Raktim Banerjee. I love to code.'

var regex = /\b([^!?. ]+)(?:(?: +)([^!?. ]+))*\b([!?.])/g
var m, numSentences = 0, numWords = 0;
do {
    m = regex.exec(s);
    if (m) {
        numSentences++;
        numWords += m[0].split(' ').length
    }
} while (m);
console.log(numSentences + ' sentences, ' + numWords + ' words')

Here is a second iteration. I modified the regex to recognize a few salutations, Mr., Mrs. and Dr. (you can add additional ones), and to add a primitive sub regular expression to recognize an email address. And I also simplified the original regex a bit. I hope this helps (no guarantees because the email check is overly simplified):

var s = 'Mr. Raktim Banerjee. My email address is [email protected].'

var regex = /\b((Mrs?\.|Dr\.|\S+@\S+|[^!?. ]+)\s*)+([!?.])/g
var m, numSentences = 0, numWords = 0;
do {
    m = regex.exec(s);
    if (m) {
        numSentences++;
        numWords += m[0].split(' ').length
    }
} while (m);
console.log(numSentences + ' sentences, ' + numWords + ' words')

Upvotes: 1

Ashutosh Parida
Ashutosh Parida

Reputation: 104

Are you looking for something like this :

import re
s1="I am Raktim Banerjee. I love to code. "
s2="Stackoverflow is the best coding forum. I love stackoverflow! "

print(len(re.compile("[^?!.]+").findall(s1))-1,"sentence",len(re.compile("[^\s]+").findall(s1)),"words")

print(len(re.compile("[^?!.]+").findall(s2))-1,"sentence",len(re.compile("[^\s]+").findall(s2)),"words")

Running above outputs:

2 sentence 8 words
2 sentence 9 words

Upvotes: 1

Related Questions