transilvlad
transilvlad

Reputation: 14532

regex: match everything until a new lines comes without space after it

I have this example:

This is a simple test text.
Yet another line.
START: This is the part that
 needs match.
This part does not need
 capture.
Wherever else text.

I want to match this part:

START: This is the part that
     needs capture.

The point is I know the START: is there and it ends with a new line that has anything but a space after it.

I have tried a lot of combinations starting from: START: (.*?)

I have plaid around with \r and anything I could think of to match only if it has no white-space.

I am not a noob asking because I am lazy. I spent a few hours before asking.

Upvotes: 5

Views: 18340

Answers (2)

Francesco Casula
Francesco Casula

Reputation: 27130

This code will work correctly with your example test.

The workaround is a token to replace the new lines before the preg_match (that are restored after!) and the Ungreedy modifier at the end of regex (U)

<?php

$token = '#####';

$text = <<<TXT
This is a simple test text.
Yet another line.
START: This is the part that
 needs match.
This part does not need
 capture.
Wherever else text.
TXT;

$text = str_replace("\n", $token, $text);

if (preg_match('/(?P<match>START:(.)*)(' . $token . '){1}[^ ]+/Uu', $text, $matches))
{
    $match = str_replace($token, "\n", $matches['match']);
    var_dump($match);
}

$text = str_replace($token, "\n", $text);

Output will be:

string(42) "START: This is the part that
 needs match."

Upvotes: 1

Tim Pietzcker
Tim Pietzcker

Reputation: 336158

How about this:

preg_match(
    '/^         # Start of line
    START:\     # Match "START: "
    .*          # Match any characters except newline
    \r?\n       # Match newline
    (?:         # Try to match...
     ^          # from the start of the line:
     \ +        #  - one or more spaces
     .*         #  - any characters except newline
     \r?\n      #  - newline
    )*          # Repeat as needed/mx', 
    $subject)

This assumes that all lines are newline-terminated.

Upvotes: 12

Related Questions