Mr. Z.
Mr. Z.

Reputation: 348

If pattern repeats two times (nonconsecutive) match both patterns, regex

I have 3 values that I'm trying to match. foo, bar and 123. However I would like to match them only if they can be matched twice.

In the following line:

foo;bar;123;foo;123;

since bar is not present twice, it would only match:

foo;bar;123;foo;123;

I understand how to specify to match exactly two matches, (foo|bar|123){2} however I need to use backreferences in order to make it work in my example. I'm struggling putting the two concepts together and making a working solution for this.

Upvotes: 3

Views: 268

Answers (2)

Jan
Jan

Reputation: 43169

You could use

(?<=^|;)([^\n;]+)(?=.*(?:(?<=^|;)\1(?=;|$)))


Broken down, this is

(?<=^|;)         # pos. loobehind, either start of string or ;
([^\n;]+)        # not ; nor newline 1+ times
(?=.*            # pos. lookahead
    (?:
        (?<=^|;) # same pattern as above
        \1       # group 1
        (?=;|$)  # end or ;
     )
)

\b       # word boundary
([^;]+)  # anything not ; 1+ times
\b       # another word boundary
(?=.*\1) # pos. lookahead, making sure the pattern is found again

See a demo on regex101.com.


Otherwise - as said in the comments - split on the ; programmatically and use some programming logic afterwards.

Find a demo in Python for example (can be adjusted for other languages as well):

from collections import Counter

string = """
foo;bar;123;foo;123;
foo;bar;foo;bar;
foo;foo;foo;bar;bar;
"""

twins = [element
        for line in string.split("\n")
        for element, times in Counter(line.split(";")).most_common()
        if times == 2]
print(twins)

Upvotes: 2

Oliver Too Eh
Oliver Too Eh

Reputation: 171

making sure to allow room for text that may occur in between matches with a ".*", this should match any of your values that occur at least twice:

(foo|bar|123).*\1

Upvotes: 1

Related Questions