Reputation: 157
I'm trying to figure out a nice line of regex to match the following:
1:[any chars here except newlines]|1:[any chars here except newlines]...
I want my regex to be able to match an infinite number of repeats of this type. The clostest I've come to figuring it out is with '(1:[^|]*\|)\1+'
, but it doesn't work for two reasons. Firstly, that will only find strings that have an additional pipe at the end of the string. Secondly, the text within the first capture must be the same throughout.
I could solve this using a split, but I just wondered if there was a nice way of doing this in a regular expression.
Upvotes: 2
Views: 3339
Reputation: 174696
You could do like this,
^(1:[^|\n]*)(?:\|(?1))*$
(?1)
Recurses the first capturing group. Read more about recursive regex at here .
For languages which won't support recursive regex.
^(?:1:[^|\n]*)(?:\|1:[^|\n]*)*$
Python code:
In [10]: import re
In [11]: s = """1:[any chars here except newlines]|1:[any chars here except newlines]
...: 1:[any chars here except newlines]
...: 1:foo
...: 1:foo|1:bar
...: 1:foo|1:bar|1:baz
...: 1:foo|1:bar|1:baz|1:bak
...: 1:foo|"""
In [14]: for i in re.findall(r'(?m)^(?:1:[^|\n]*)(?:\|1:[^|\n]*)*$', s):
...: print(i)
...:
1:[any chars here except newlines]|1:[any chars here except newlines]
1:[any chars here except newlines]
1:foo
1:foo|1:bar
1:foo|1:bar|1:baz
1:foo|1:bar|1:baz|1:bak
Upvotes: 1
Reputation: 76646
Apply the quantifier to the entire group:
^(?:1:[^|\n]*\|?)+(?<!\|)$
^
asserts the position at the beginning of the string. It then matches 1:
followed by any characters that are not |
or a newline, zero or more times (indicated by the *
). This entire group can be repeated one or more times (indicated by the +
). The (?<!\|)
is a negative lookbehind that asserts that the last character is not a |
. $
asserts position at the end of the string.
It matches all of these:
1:foo
1:foo|1:bar
1:foo|1:bar|1:baz
1:foo|1:bar|1:baz|1:bak
But will not match
1:foo|
and similar.
Upvotes: 5