ZachTurn
ZachTurn

Reputation: 656

Capturing the same regular expression over multiple lines

I want to capture series of file names that are listed each in a new line, and I have figured out how to capture the file name in the first line, but I haven't figured out how to repeat it on the subsequent lines.

# Input
# data/raw/file1
# data/raw/file2

# Output
# data/interim/file1
# data/interim/file2

Current Attempt

The regular expression I currently have is

# Input\n(# (.*))

And my inner capture group properly captures data/raw/file1.

Desired Output

What I want is to grab all of the files in between # Input and # Output, so in this example, data/raw/file1 and data/raw/file2.

Upvotes: 0

Views: 156

Answers (2)

revo
revo

Reputation: 48711

Go with \G magic:

(?:^#\s+Input|\G(?!\A))\R*(?!#\s+Output)#\s*(.*)|[\s\S]*

Live demo

Regex breakdown

(?:                 # Start of non-capturing group (a)
    ^#\s+Input          # Match a line beginning with `# Input`
    |                   # Or
    \G(?!\A)            # Continue from previous successful match point
)                   # End of NCG (a)
\R*                 # Match any kind of newline characters
(?!#\s+Output)      # Which are not followed by such a line `# Output`
#\s*(.*)            # Start matching a path line and capture path
|                   # If previous patterns didn't match....
[\s\S]*             # Then match everything else up to end to not involve engine a lot

PHP code:

$re = '~(?:^#\s+Input|\G(?!\A))\R*(?!#\s+Output)#\s*(.*)|[\s\S]*~m';
$str = '# Input
# data/raw/file1
# data/raw/file2

# Output
# data/interim/file1
# data/interim/file2';

preg_match_all($re, $str, $matches, PREG_PATTERN_ORDER, 0);

// Print the entire match result
print_r(array_filter($matches[1]));

Output:

Array
(
    [0] => data/raw/file1
    [1] => data/raw/file2
)

Upvotes: 2

chris85
chris85

Reputation: 23892

Using the s modifier, preg_match, and preg_split you can get each result on its own.

preg_match('/# Input\n(# (?:.*?))# Output/s', '# Input
# data/raw/file1
# data/raw/file2

# Output
# data/interim/file1
# data/interim/file2', $match);
$matched = preg_split('/# /', $match[1], -1, PREG_SPLIT_NO_EMPTY);
print_r($matched);

Demo: https://3v4l.org/dAcRp

Regex demo: https://regex101.com/r/5tfJGM/1/

Upvotes: 2

Related Questions