Jack
Jack

Reputation: 384

Make regex quantifier length depend on previous capture group

I'm hoping to use a regex to parse strings which begin with an integer n. After a space, there are n characters, after which there may be more text. I'm hoping to capture n and the n characters that follow. There are no constraints on these n characters. In other words, 5 hello world should match with the capture groups 5 and hello.

I tried this regex, but it wouldn't compile because its structure depends on the input: (\d+) .{\1}.

Is there a way to get the regex compiler to do what I want, or do I have to parse this myself?

I'm using Rust's regex crate, if that matters. And if it's not possible with regex, is it possible with another, more sophisticated regex engine?

Thanks!

Upvotes: 1

Views: 267

Answers (2)

jdaz
jdaz

Reputation: 6063

As @Cary Swoveland said in the comments, this is not possible in regex in one step without hard-coding the various possible lengths.

However, it is not too difficult to take a substring of the matched string with length from the matched digit:

use regex::Regex;
    
fn main() {
    let re = Regex::new(r"(\d+) (.+)").unwrap();
    let test_str = "5 hello world";

    for cap in re.captures_iter(test_str) {
        let length: usize = cap[1].parse().unwrap_or(0);
        let short_match: String = cap[2].chars().take(length).collect();

        println!("{}", short_match); // hello
    }
}

If you know you'll only be dealing with ASCII characters (no Unicode, accent marks, etc.) then you can use the simpler slice syntax let short_match = &cap[2][..length];.

Upvotes: 3

tshiono
tshiono

Reputation: 22042

If Perl is your option, would you please try:

perl -e '
$str = "5 abcdefgh";
$str =~ /(\d+) ((??{".{".($^N)."}"}))/;
print "1st capture group = $1\n";
print "2nd capture group = $2\n";
print "whole capture group = $&\n";
'

Output:

1st capture group = 5
2nd capture group = abcde
whole capture group = 5 abcde

[Explanation]

  • If the (??{...}) block is encountered in a regex, its contents are expanded as a Perl code on the fly.
  • The special variable $^N refers to the last captured group and is expanded as 5 in the case.
  • Then the code (??{".{".($^N)."}"}) is evaluated as .{5} which represents a dot followed by a quantifier.

Upvotes: 0

Related Questions