sdaffa23fdsf
sdaffa23fdsf

Reputation: 307

python regex non-capture group handling

(1[0-9]{2})\s+(\w+(?:-\w+)+)\s+(\w+)\s+(\w+(?:-\w+)+)\s+(\w+)

used to match string

123    FEX-1-80  Online  N2K-C2248TP-1GE    SSDFDFWFw23r23

How come this works in regexr.com but Python 3.5.1 can't find a match

r'(1[0-9]{2})\s+(\w+(?:-\w+)+)\s+(\w+)\s+(\w+(?:-\w+))'

can match up to

123    FEX-1-80  Online  N2K-C2248TP

but the second hyphen - in group(4) is not matched

From what I understand, non-capture group character can appear more than once in the group, what went wrong here?

Upvotes: 1

Views: 1569

Answers (2)

Jan
Jan

Reputation: 43189

Just a comment, not really an answer but for the sake of clarity I have put it as an answer.
Being relatively new to regular expressions, one should use the verbose mode. With this, your expression becomes much much more readable:

(1[0-9]{2})\s+     # three digits, the first one needs to be 1
(\w+(?:-\w+)+)\s+  # a word character (wc), followed by - and wcs
(\w+)\s+           # another word
(\w+(?:-\w+)+)\s+  # same expression as above
(\w+)              # another word

Also, check if your (second and fourth) expression could be rewritten as [\w-]+ - it is not the same as yours and will match other substrings but try to avoid nested parenthesis in general.

Concerning your question, the second string cannot be matched as you made all of your expressions mandatory (and group 5 is missing in the second example, so it will fail).

See a demo on regex101.com.

Upvotes: 1

user94559
user94559

Reputation: 60153

This regular expression matches the full input string:

(1[0-9]{2})\s+(\w+(?:-\w+)+)\s+(\w+)\s+(\w+(?:-\w+)+)\s+(\w+)

This one doesn't:

(1[0-9]{2})\s+(\w+(?:-\w+)+)\s+(\w+)\s+(\w+(?:-\w+))

The latter is missing a + after the last non-capturing group, and it's missing the \s+(\w+) at the end that matches the SSDFDFWFw23r23 at the end of the input string.

From what I understand, non-capture group character can appear more than once in the group, what went wrong here?

I'm not sure I follow. A non-capturing group is really just there to group a part of a regular expression.

(?:-\w+) or just -\w+ will both match a hyphen (-) followed by one or more "word" characters (\w+). It doesn't matter whether that regular expression is in a non-capturing group or not. If you want to match repetitions of that pattern, you can use the + modifier after the non-capturing group, e.g. (?:-\w+)+. That pattern will match a string like -foo-bar-baz.

So the reason your second regular expression doesn't match the repeated pattern is because it's lacking the + modifier.

Upvotes: 0

Related Questions