David Urry
David Urry

Reputation: 825

How do I use Java Regex to find all repeating character sequences in a string?

Parsing a random string looking for repeating sequences using Java and Regex.

Consider strings:

aaabbaaacccbb

I'd like to find a regular expression that will find all the matches in the above string:

aaabbaaacccbb
^^^  ^^^

aaabbaaacccbb
   ^^      ^^

What is the regex expression that will check a string for any repeating sequences of characters and return the groups of those repeating characters such that group 1 = aaa and group 2 = bb. Also note that I've used an example string but any repeating characters are valid: RonRonJoeJoe ... ... ,, ,,...,,

Upvotes: 10

Views: 14516

Answers (5)

user557597
user557597

Reputation:

You could disregard overlap.

// overlapped 1 or more chars
(?=(\w{1,}).*\1)
// overlapped 2 or more chars
(?=(\w{2,}).*\1)
// overlapped 3 or more chars, etc ..
(?=(\w{3,}).*\1)

Or, you could consume (non-overlapped) ..

// 1 or more chars
(?=(\w{1,}).*\1) \1
// 2 or more chars
(?=(\w{2,}).*\1) \1
// 3 or more chars, etc ..
(?=(\w{3,}).*\1) \1

Upvotes: 0

Trevor Freeman
Trevor Freeman

Reputation: 7242

The below should work for all requirements. It is actually a combination of a couple of the answers here, and it will print out all of the substrings that are repeated anywhere else in the string.

I set it to only return substrings of at least 2 characters, but it can be easily changed to single characters by changing "{2,}" in the regex to "+".

public static void main(String[] args)
{
  String s = "RonSamJoeJoeSamRon";
  Matcher m = Pattern.compile("(\\S{2,})(?=.*?\\1)").matcher(s);
  while (m.find())
  {
    for (int i = 1; i <= m.groupCount(); i++)
    {
      System.out.println(m.group(i));
    }
  }
}

Output:
Ron
Sam
Joe

Upvotes: 3

Guillaume Polet
Guillaume Polet

Reputation: 47637

This does it:

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Test {
    public static void main(String[] args) {
        String s = "aaabbaaacccbb";
        find(s);
        String s1 = "RonRonRonJoeJoe .... ,,,,";
        find(s1);
        System.err.println("---");
        String s2 = "RonBobRonJoe";
        find(s2);
    }

    private static void find(String s) {
        Matcher m = Pattern.compile("(.+)\\1+").matcher(s);
        while (m.find()) {
            System.err.println(m.group());
        }
    }
}

OUTPUT:

aaa
bb
aaa
ccc
bb
RonRonRon
JoeJoe
....
,,,,
---

Upvotes: 9

Reverend Gonzo
Reverend Gonzo

Reputation: 40871

This seems to work, though it gives subsequences as well:

(To be fair, this was built off of Guillame's code)

public static void main(final String[] args) {
    // final String s = "RonRonJoeJoe";
    // final String s = "RonBobRonJoe";
    final String s = "aaabbaaacccbb";

    final Pattern p = Pattern.compile("(.+).*\\1");

    final Matcher m = p.matcher(s);
    int start = 0;
    while (m.find(start)) {
        System.out.println(m.group(1));
        start = m.toMatchResult().end(1);
    }
}

Upvotes: 0

anubhava
anubhava

Reputation: 786291

You can use this positive lookahead based regex:

((\\w)\\2+)(?=.*\\1)

Code:

String elem = "aaabbaaacccbb";
String regex = "((\\w)\\2+)(?=.*\\1)";
Pattern p = Pattern.compile(regex);
Matcher matcher = p.matcher(elem);
for (int i=1; matcher.find(); i++)
System.out.println("Group # " + i + " got: " + matcher.group(1));

OUTPUT:

Group # 1 got: aaa
Group # 2 got: bb

Upvotes: 2

Related Questions