sassy_rog
sassy_rog

Reputation: 1097

Regex match first sentence even on one sentence

I have a interesting regex problem. say I have paragraph like this

Johannesburg (; Afrikaans: ; also known as Jozi, Jo'burg, and eGoli) is the largest city in South Africa and one of the 50 largest urban areas in the world. It is the provincial capital and largest city of Gauteng, which is the wealthiest province in South Africa. While Johannesburg is not one of South Africa's three capital cities, it is the seat of the Constitutional Court. The city is located in the mineral-rich Witwatersrand range of hills and is the centre of large-scale gold and diamond trade.

this regex (^.*?[a-z]{2,}[.!?])\s+\W*[A-Z] works well in finding the first sentence based on sentence construct logic. the problem comes when I have just one sentence like this

Johannesburg (; Afrikaans: ; also known as Jozi, Jo'burg, and eGoli) is the largest city in South Africa and one of the 50 largest urban areas in the world.

It doesn't match this sentence understandably because there's no other sentence starting after it. My question is now how do I adjust this expression so it applies to both cases?

Upvotes: 1

Views: 60

Answers (1)

The fourth bird
The fourth bird

Reputation: 163362

You could use an alternation (^.*?[a-z]{2,}[.!?])(?:\s+\W*[A-Z]|$) to match either the required logic or assert the end of the string $.

(^.*?[a-z]{2,}[.!?])(?=\s+\W*[A-Z]|$)

Regex demo

If you don't need the capturing group () at the start you might omit it as well and use a positive lookahead (?= to get a match only:

^.*?[a-z]{2,}[.!?](?=\s+\W*[A-Z]|$)

Regex demo

Upvotes: 2

Related Questions