Reputation: 384
I'm hoping to use a regex to parse strings which begin with an integer n. After a space, there are n characters, after which there may be more text. I'm hoping to capture n and the n characters that follow. There are no constraints on these n characters. In other words, 5 hello world
should match with the capture groups 5
and hello
.
I tried this regex, but it wouldn't compile because its structure depends on the input: (\d+) .{\1}
.
Is there a way to get the regex compiler to do what I want, or do I have to parse this myself?
I'm using Rust's regex
crate, if that matters. And if it's not possible with regex
, is it possible with another, more sophisticated regex engine?
Thanks!
Upvotes: 1
Views: 267
Reputation: 6063
As @Cary Swoveland said in the comments, this is not possible in regex in one step without hard-coding the various possible lengths.
However, it is not too difficult to take a substring of the matched string with length from the matched digit:
use regex::Regex;
fn main() {
let re = Regex::new(r"(\d+) (.+)").unwrap();
let test_str = "5 hello world";
for cap in re.captures_iter(test_str) {
let length: usize = cap[1].parse().unwrap_or(0);
let short_match: String = cap[2].chars().take(length).collect();
println!("{}", short_match); // hello
}
}
If you know you'll only be dealing with ASCII characters (no Unicode, accent marks, etc.) then you can use the simpler slice syntax let short_match = &cap[2][..length];
.
Upvotes: 3
Reputation: 22042
If Perl
is your option, would you please try:
perl -e '
$str = "5 abcdefgh";
$str =~ /(\d+) ((??{".{".($^N)."}"}))/;
print "1st capture group = $1\n";
print "2nd capture group = $2\n";
print "whole capture group = $&\n";
'
Output:
1st capture group = 5
2nd capture group = abcde
whole capture group = 5 abcde
[Explanation]
(??{...})
block is encountered in a regex, its contents
are expanded as a Perl
code on the fly.$^N
refers to the last captured group
and is expanded as 5
in the case.(??{".{".($^N)."}"})
is evaluated as .{5}
which
represents a dot followed by a quantifier.Upvotes: 0