Reputation: 1482
Is it possible to detect repeated number patterns with a regular expression?
So for example, if I had the following string "034503450345", would it be possible to match the repeated sequence 0345? I have a feeling this is beyond the scope of regex, but I thought I would ask here anyway to see if I have missed something.
Upvotes: 10
Views: 14072
Reputation: 112160
This expression will match one or more repeating groups:
(.+)(?=\1+)
Here is the same expression broken down, (using commenting so it can still be used directly as a regex).
(?x) # enable regex comment mode
( # start capturing group
.+ # one or more of any character (excludes newlines by default)
) # end capturing group
(?= # begin lookahead
\1+ # match one or more of the first capturing group
) # end lookahead
To match a specific pattern, change the .+
to that pattern, e.g. \d+
for one or more numbers, or \d{4,}
to match 4 or more numbers.
To match a specific number of the pattern, change \1+
, e.g to \1{4}
for four repetitions.
To allow the repetition to not be next to each other, you can add .*?
inside the lookahead.
Upvotes: 22
Reputation: 716
Use regex repetition: bar{2,} looks for text with two or more bar: barbar barbarbar ...
Upvotes: -1
Reputation: 23808
This is the C# code, that uses the backreference construct to find repeated digits. It will work with 034503450345, 123034503450345, 034503450345345, 232034503450345423. The regex is much easier and clearer to understand.
/// <summary>
/// Assigns repeated digits to repeatedDigits, if the digitSequence matches the pattern
/// </summary>
/// <returns>true if success, false otherwise</returns>
public static bool TryGetRepeatedDigits(string digitSequence, out string repeatedDigits)
{
repeatedDigits = null;
string pattern = @"^\d*(?<repeat>\d+)\k<repeat>+\d*$";
if (Regex.IsMatch(digitSequence, pattern))
{
Regex r = new Regex(pattern, RegexOptions.IgnoreCase | RegexOptions.Compiled);
repeatedDigits = r.Match(digitSequence).Result("${repeat}");
return true;
}
else
return false;
}
Upvotes: 2
Reputation: 83609
Just to add a note to the (correct) answer from RichieHindle:
Note that while Python's regexp implementation (and many others, such as Perl's) can do this, this is no longer a regular expression in the narrow sense of the word.
Your example is not a regular language, hence cannot be handled by a pure regular expression. See e.g. the excellent Wikipedia article for details.
While this is mostly only of academic interest, there are some practical consequences. Real regular expressions can make much better guarantees for maximum runtimes than in this case. So you could get performance problems at some point.
Not to say that it's not a good solution, but you should realize that you're at the limit of what regular expressions (even in extended form) are capable of, and might want to consider other solutions in case of problems.
Upvotes: 9
Reputation: 281485
Yes, you can - here's a Python test case
import re
print re.search(r"(\d+).*\1", "8034503450345").group(1)
# Prints 0345
The regular expression says "find some sequence of digits, then any amount of other stuff, then the same sequence again."
On a barely-related note, here's one of my favourite regular expressions - a prime number detector:
import re
for i in range(2, 100):
if not re.search(r"^(xx+)\1+$", "x"*i):
print i
Upvotes: 11