Reputation: 197
I am looking for a way to get words out of a sentence. I am pretty far with the following expression:
\b([a-zA-Z]+?)\b
but there are some occurrences that it counts a word when I want it not to. E.g a word followed by more than one period like "text..". So, in my regex I want to have the period to be at the end of a word zero or one time. Inserting \.?
did not do the trick, and variations on this have not yielded anything fruitful either.
Hope someone can help!
Upvotes: 0
Views: 296
Reputation: 4564
to avoid a match on your example "test.." you ask for you not only need to put the \.?
for checking first character after the word to be a dot but also look one character further to check the second character after the word.
I did end up with something like this
\w{2,}\.?[^.]
You should also consider that a sentence not always ends with a .
but also !
or ?
and alike.
I usually use rubulator.com to quick test a regexp
Upvotes: 0
Reputation: 57936
A single dot means any character. You must escape it as
\.?
Maybe you want an expression like this:
\w+\.?
or
\p{L}+\.?
Upvotes: 1
Reputation: 120927
You need to add \.?
(and not .?
) because the period has special meaning in regexes.
Upvotes: 0