Reputation: 348
I have 3 values that I'm trying to match. foo
, bar
and 123
. However I would like to match them only if they can be matched twice.
In the following line:
foo;bar;123;foo;123;
since bar
is not present twice, it would only match:
foo
;bar;123
;foo
;123
;
I understand how to specify to match exactly two matches, (foo|bar|123){2}
however I need to use backreferences in order to make it work in my example.
I'm struggling putting the two concepts together and making a working solution for this.
Upvotes: 3
Views: 268
Reputation: 43169
You could use
(?<=^|;)([^\n;]+)(?=.*(?:(?<=^|;)\1(?=;|$)))
(?<=^|;) # pos. loobehind, either start of string or ;
([^\n;]+) # not ; nor newline 1+ times
(?=.* # pos. lookahead
(?:
(?<=^|;) # same pattern as above
\1 # group 1
(?=;|$) # end or ;
)
)
\b # word boundary
([^;]+) # anything not ; 1+ times
\b # another word boundary
(?=.*\1) # pos. lookahead, making sure the pattern is found again
;
programmatically and use some programming logic afterwards.
Find a demo in Python
for example (can be adjusted for other languages as well):
from collections import Counter
string = """
foo;bar;123;foo;123;
foo;bar;foo;bar;
foo;foo;foo;bar;bar;
"""
twins = [element
for line in string.split("\n")
for element, times in Counter(line.split(";")).most_common()
if times == 2]
print(twins)
Upvotes: 2
Reputation: 171
making sure to allow room for text that may occur in between matches with a ".*", this should match any of your values that occur at least twice:
(foo|bar|123).*\1
Upvotes: 1