jerome
jerome

Reputation: 2089

Don't understand the result of regex

I would like to know why the regex below is accepting 1.

"((^G0{0,2}$)|(^T|^R0{0,2}$)){0,5}"

I would like my regex to accept the sequences G00, G01, T00, R00 any number of times. At the moment I'm only trying to have G00, T00, R00 any number of times, but my regex is also accepting 1 as input. The regex should also accept G, G0, T, T0, R, R0, but the goal is to have a sequence of 3 characters.

Upvotes: 0

Views: 98

Answers (3)

Right now, your regex matches an empty string, and will find nothing at all.

(...){0, 5} 

can match ... 0 times, thus finding matches on every string.


Your specific requirement(to match only those 4 inputs) would probably want a regex like this

^(?:G01)|[GRT]00$

http://rubular.com/r/BrlxDfGkdf

if you want to be able to get multiple matches per line, than just leave off the anchors: ^ and $

(?:G01)|[GRT]00

http://rubular.com/r/3ODzf08eT5

Upvotes: 1

Lily
Lily

Reputation: 316

I think because you allow 0-5 repetitions of this, anything can match it 0 times. Why not force it to match at least once?

"((^G0{0,2}$)|(^T|^R0{0,2}$))+"

Upvotes: 0

Amadan
Amadan

Reputation: 198324

The regexp is matching zero repetitions of the alternation, with match length 0. (If you repeat it 0 times, the ^ anchor does not fire, so it can match anywhere.) You should extract the anchors outside the repetition. Something like...

^(?:[GTR]\d{0,2})+$
-                    start
 ---            --   any number of repetitions (1+) of
    -----            any of "G", "T", or "R"
         -------     0-2 digits
                  -  end

If your main sequence is repeating, capture groups don't make any sense, so I've stripped them.

Upvotes: 2

Related Questions