netbrain
netbrain

Reputation: 9304

Regex: using dot all flag, how to prefix each line?

What would be the proper regex to prefix each line?

Say i have the the input data:

SOME OTHER DATA

TABLE
ROW
ROW
ROW
END

SOME OTHER DATA

Im only interested in what is between and including TABLE and END.

In php you can write a regex like the following /TABLE.*?END/s which would match the first occurence of TABLE to the first occurence of END. But is there a way i can prefix each line with %? so the result would become:

SOME OTHER DATA

%TABLE
%ROW
%ROW
%ROW
%END

SOME OTHER DATA

Any help is appreciated.

Upvotes: 0

Views: 420

Answers (3)

Casimir et Hippolyte
Casimir et Hippolyte

Reputation: 89584

You can do it with a single replacement:

$txt = preg_replace('~^(?:TABLE\R|\G(?!\A)(?:END$|.+\R|.+\z))~m', '%$0', $txt);

Note that this pattern assume there's always a closing END "tag". If it isn't the case the replacement will continue until an empty line (cause of the + quantifier) or the end of the string.

You can also make the choice to check if the TABLE tag is closed with an END tag:

$pattern = '~^(?:TABLE\R(?=(?:.+\R)*?END$)|\G(?!\A)(?:END$|.+\R|.+\z))~m';

First pattern details:

^                   # matches the start of a line
(?:                 # open a non-capturing group
    TABLE \R        # TABLE and a newline (CR, LF or CRLF)
  |                 # OR
    \G (?!\A)       # contigous to a precedent match but not
                    # at the start of the string 
    (?:             #
        END $       # END at the end of a line
      |             #
        .+ \R       # a line (not empty) and a newline
      |             #
        .+ \z       # the last line of the string
    )               # close the non-capturing group
)                   #

Additional lookahead details:

(?=             # open the lookahead
    (?:.+\R)*?  # matches zero or more lines lazily
    END$        # until the line END
)

An other way

$arr = preg_split('/\R/', $txt);
$state = false;
foreach ($arr as &$line) {
    if ($state || $line === 'TABLE') {
        $state = ($line !== 'END');
        $line = '%' . $line;
    }
}
$txt = implode("\n", $arr);

The behaviour of this code is the same as the first pattern, note that you obtain a string with UNIX format newlines.

Upvotes: 3

Firas Dib
Firas Dib

Reputation: 2621

Here you go. I did created one regex and commented it properly for you:

/(?:
 #start by finding the initial position of the table start, in order to store the match position for \G
    TABLE\n\K|
    #after we've found the table head, continue matching using this position. make sure we arent at the beginning of the string
    \G(?<!^)
)
#capture the data we're interested in
(?:
    #make sure there is no 'END' in the string
    (?!END)
    #match everything until the line ending
    .
)*
#consume the newline at the end of the string
\n/x

Replace the result with %$0

See it in action here: http://regex101.com/r/rA5bV1

--

I do recommend however, if you do not understand the regex I have created, to use an alternative method. Create a regex that would capture the contents of the table, and then just append % to every line. Use the following expression to capture the contents: /TABLE\n((?:(?!END).)*)END/. I did not comment this, you should be able to figure it out by reading the comments of the other expression.

Upvotes: 1

Thibault
Thibault

Reputation: 1596

You should do it with 2 regex :

$txt = file_get_contents('input.txt');
preg_match("#(.*(?<=TABLE\n))(.*\nEND)(.*)#ms",$txt,$m);
$new = $m[1].preg_replace("#^#ms","%",$m[2]).$m[3];
print $new;

ms modifiers make the regex act like the whole text is one line and the \n is match like a normal character with ..

If you want to do it in only one regex, you will have to use special matching blocks like one of theses:

Hope that helps.

Upvotes: 0

Related Questions