TheDude
TheDude

Reputation: 197

C# Regex: only letters followed by an optional

I am looking for a way to get words out of a sentence. I am pretty far with the following expression:

\b([a-zA-Z]+?)\b

but there are some occurrences that it counts a word when I want it not to. E.g a word followed by more than one period like "text..". So, in my regex I want to have the period to be at the end of a word zero or one time. Inserting \.? did not do the trick, and variations on this have not yielded anything fruitful either.

Hope someone can help!

Upvotes: 0

Views: 296

Answers (3)

bw_üezi
bw_üezi

Reputation: 4564

to avoid a match on your example "test.." you ask for you not only need to put the \.? for checking first character after the word to be a dot but also look one character further to check the second character after the word.

I did end up with something like this \w{2,}\.?[^.]

You should also consider that a sentence not always ends with a . but also ! or ? and alike.

I usually use rubulator.com to quick test a regexp

Upvotes: 0

Rubens Farias
Rubens Farias

Reputation: 57936

A single dot means any character. You must escape it as

\.?

Maybe you want an expression like this:

\w+\.?

or

\p{L}+\.?

Upvotes: 1

Klaus Byskov Pedersen
Klaus Byskov Pedersen

Reputation: 120927

You need to add \.? (and not .?) because the period has special meaning in regexes.

Upvotes: 0

Related Questions