user2936008
user2936008

Reputation: 1347

Regex excluding words

Hi All I am new to regex :

I have a string and etc. is considered as end of sentence, how can I make etc. not to be considered as end of sentence in the existing regex.

sentence: 'hello how are you, can you pass me pen, book etc. I am going to travel abroad. I am going on vacation. Let me know if anything needs to be done in something.com.'; 
regex: (/(.*?(?:\.|\?|!))(?: |$)/g);

Current Output :

Expected Output:

JSfiddle

Upvotes: 0

Views: 81

Answers (3)

Hugo
Hugo

Reputation: 109

This will do what you want:

([a-zA-Z0-9\ \,]+(?!\ etc)\.)/g

Note that you said not to match "etc.". In this regexp the domain name will be splitted as there is a dot between something and com.

Upvotes: 0

Me.Name
Me.Name

Reputation: 12544

In the example case it's exceptionally difficult because it would be a valid end of the sentence. The next letter being a capital letter.

Looking ahead to see, not only for the end of line, but also if the next letter is a capital letter would catch most cases:

var sentences = stringSentence.match(/(.*?(?:[.?!])\s*)(?=([A-Z])|$)/g);

But in this example, since I is a capital letter, it would still break. But if a comma and/or a word as 'because' was added after etc., the match would work (and would be grammatically more correct)

If that is not enough, certain exceptions could be added which indicate an abbreviation. Problem is, that abbreviation could actually be at the end of a sentence... For example, I am going on vacation to relax etc. should match.

Upvotes: 1

hawkeye315
hawkeye315

Reputation: 15

The easiest way would be to use .. or ... after etc. However, if you can't do that, I would go about it making a specific matching case for etc, since it is indeed a specific case. Try looking at these:

http://regexone.com/lesson/matching_characters (Look at the solution to get an idea)

One possible solution would be this:

(?<![\w\d])etc(?![\w\d])

This would match etc but no words around it, only periods. It would still accept .etc I believe though if that is a problem.

Upvotes: 0

Related Questions