janneb
janneb

Reputation: 37228

Match same number of repetitions as previous group

I'm trying to match strings that are repeated the same number of times, like

abc123
abcabc123123
abcabcabc123123123
etc.

That is, I want the second group (123) to be matched the same number of times as the first group (abc). Something like

(abc)+(123){COUNT THE PREVIOUS GROUP MATCHED}

This is using the Rust regex crate https://docs.rs/regex/1.4.2/regex/

Edit As I feared, and pointed out by answers and comments, this is not possible to represent in regex, at least not without some sort of recursion which the Rust regex crate doesn't for the time being support. In this case, as I know the input length is limited, I just generated a rule like

(abc123)|(abcabc123123)|(abcabcabc123123123)

Horribly ugly, but got the job done, as this wasn't "serious" code, just a fun exercise.

Upvotes: 2

Views: 946

Answers (4)

Luis Colorado
Luis Colorado

Reputation: 12698

There's an extension to the regexp libraries, that is implemented from the old times unix and that allows to match (literally) an already scanned group literally after the group has been matched.

For example... let's say you have a number, and that number must be equal to another (e.g. the score of a soccer game, and you are interested only in draws between the two teams) You can use the following regexp:

([0-9][0-9]*) - \1

and suppose we feed it with "123-123" (it will match) but if we use "123-12" that will not match, as the \1 is not the same string as what was matched in the first group. When the first group is matched, the actual regular expression converts the \1 into the literal sequence of characters that was matched in the first group.

But there's a problem with your sample... is that there's no way to end the first group if you try:

([0-9][0-9]*)\1

to match 123123, because the automaton cannot close the first group (you need at least a nondigit character to make the first group to finalize)

But for example, this means that you can use:

\+(\([0-9][0-9]*\))\1(-\1)*

and this will match phone numbers in the form

+(358)358-358-358

or

+(1)1-1-1-1-1-1-1

(the number in between the parenthesys is catched as a sample, and then you use the group to build a sequence of that number separated by dashes. You can se the expression working in this demo.)

Upvotes: 1

Ryszard Czech
Ryszard Czech

Reputation: 18631

Building a pattern dynamically is also an option. Matching one, two or three nested abc and 123 is possible with

abc(?:abc(?:abc(?:)?123)?123)?123

See proof. (?:)? is redundant, it matches no text, (?:...)? matches an optional pattern.

Rust snippet:

let a = "abc"; // Prefix
let b = "123"; // Suffix
let level = 3; // Recursion (repetition) level

let mut result = "".to_string();
for _n in 0..level {
    result = format!("{}(?:{})?{}", a, result, b);
}
println!("{}", result);
// abc(?:abc(?:abc(?:)?123)?123)?123

Upvotes: 1

pretzelhammer
pretzelhammer

Reputation: 15135

As others have commented, I don't think it's possible to accomplish this in a single regex. If you can't guarantee the strings are well-formed then you'd have to validate them with the regex, capture each group, and then compare the group lengths to verify they are of equal repetitions. However, if it's guaranteed all strings will be well-formed then you don't even need to use regex to implement this check:

fn matching_reps(string: &str, group1: &str, group2: &str) -> bool {
    let group2_start = string.find(group2).unwrap();
    let group1_reps = (string.len() - group2_start) / group1.len();
    let group2_reps = group2_start / group2.len();
    group1_reps == group2_reps
}

fn main() {
    assert_eq!(matching_reps("abc123", "abc", "123"), true);
    assert_eq!(matching_reps("abcabc123", "abc", "123"), false);
    assert_eq!(matching_reps("abcabc123123", "abc", "123"), true);
    assert_eq!(matching_reps("abcabc123123123", "abc", "123"), false);
}

playground

Upvotes: 2

LeGEC
LeGEC

Reputation: 52046

Pure regular expressions are not able to represent that.

There may be some way to define back references, but I am not familiar with regexp syntax in Rust, and this would technically be a way to represent something more than a pure regular expression.

There is however a simple way to compute it :

  • use a regexp to make sure your string is a ^((abc)*)((123)*)$
  • if your string matches, take the two captured substrings, and compare their lengths

Upvotes: 1

Related Questions