Tucker
Tucker

Reputation: 7362

Match Regular Expressoin if string contains exactly N occrences of a character

I'd like a regular expression to match a string only if it contains a character that occurs a predefined number of times.

For example: I want to match all strings that contain the character "_" 3 times;

So "a_b_c_d" would pass
"a_b" would fail
"a_b_c_d_e" would fail

Does someone know a simple regular expression that would satisfy this?

Thank you

Upvotes: 4

Views: 10161

Answers (4)

Rado
Rado

Reputation: 8963

This should do it:

^[^_]*_[^_]*_[^_]*_[^_]*$

Upvotes: 1

Skippy le Grand Gourou
Skippy le Grand Gourou

Reputation: 7704

Elaborating on Rado's answer, which is so far the most polyvalent but could be a pain to write if there are more occurrences to match :

^([^_]*_){3}[^_]*$

It will match entire strings (from the beginning ^ to the end $) in which there are exactly 3 ({3}) times the pattern consisting of 0 or more (*) times any character not being underscore ([^_]) and one underscore (_), the whole being followed by 0 ore more times any character other than underscore ([^_]*, again).

Of course one could alternatively group the other way round, as in our case the pattern is symmetric :

^[^_]*(_[^_]*){3}$

Upvotes: 1

derekaug
derekaug

Reputation: 2145

If you're examples are the only possibilities (like a_b_c_...), then the others are fine, but I wrote one that will handle some other possibilities. Such as:

a__b_adf
a_b_asfdasdfasfdasdfasf_asdfasfd
___
_a_b_b

Etc.

Here's my regex.

\b(_[^_]*|[^_]*_|_){3}\b

Upvotes: 0

mathematical.coffee
mathematical.coffee

Reputation: 56915

For your example, you could do:

\b[a-z]*(_[a-z]*){3}[a-z]*\b

(with an ignore case flag).

You can play with it here

It says "match 0 or more letters, followed by '_[a-z]*' exactly three times, followed by 0 or more letters". The \b means "word boundary", ie "match a whole word".

Since I've used '*' this will match if there are exactly three "_" in the word regardless of whether it appears at the start or end of the word - you can modify it otherwise.

Also, I've assumed you want to match all words in a string with exactly three "_" in it.

That means the string "a_b a_b_c_d" would say that "a_b_c_d" passed (but "a_b" fails).

If you mean that globally across the entire string you only want three "_" to appear, then use:

^[^_]*(_[^_]*){3}[^_]*$

This anchors the regex at the start of the string and goes to the end, making sure there are only three occurences of "_" in it.

Upvotes: 5

Related Questions