Repeating numbered capture groups in Perl

Question

Imagine I'm trying to parse the following html using Perl regex:

test
 num1
 num2
 num3
test
 num1
 num2
 num3
 num4

using the following regular expression:

([\w\s]*)
(?:([\w\s]+))+

How would the numbered groups be structured in Perl? $1 would obviously contain the

tag text, but when the capture groups repeat, are the captured
tags then sent to $2 $3 and $4? Is there a good way to capture all the
tags in an array? Is this even something perl supports? Or am I forced to write a single regex for

, then another for the
's?

(I'm aware I could use `HTML::Tree` or something similar to parse the html, but this is just a simplified example I'm using to help describe the question, I'm really only interested in how repeated numbered capture groups work in Perl)

melwil · Accepted Answer

When you repeat a capturing group, only the last matching group will be stored in the matcher.

If you want to get each match from a repeating group, you could use a replaceAll with a callback function or iterate through the matches one by one.

Most languages also have a "match all", which I don't know how to do in perl. This usually stores all matches into an array for you, but repeating groups are still stored only as last matched group.

Repeating numbered capture groups in Perl

Answers (1)

Related Questions