Rog
Rog

Reputation: 456

PHP Regex is truncating matches

A little help needed.

I am part way there I think.

I have strings like this in a body of text :

"line: this is something or other with an escaped semi-colon here \; but I want to ignore that up to this final one;"

So in the middle of my string I want to include the escaped semi colon but not treat it as the end of the string - the end of the string should be the final semi-colon.

I have this regex pattern :

$regex = "/line:(.*?)[^\\\;];/";

Whilst it matches the pattern with this :

preg_match_all($regex, $texttosearch, $matches)

The contents of $matches[1][0] is truncated, in this example the 'e' is missing...

Array
(
[0] => Array
    (
        [0] => line: this is something or other with an escaped semi-colon here \; but I want to ignore that up to this final one;
    )

[1] => Array
    (
        [0] =>  this is something or other with an escaped semi-colon here \; but I want to ignore that up to this final on
    )

 )

Could anyone help with where I am going wrong please ?

Thank you.

Upvotes: 1

Views: 42

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627327

I think that just using a lookbehind to check if a ; is not preceded with \ is error-prone in case you may have other escape sequences. Use this unrolled regex (as a PHP single quoted string literal):

'~line:([^;\\\\]*(?:\\\\.[^;\\\\]*)*);~'

See the regex demo

Details:

  • line: - literal substring (to match it as a whole word, add \b in front of it)
  • ([^;\\]*(?:\\.[^;\\]*)*) - Group 1 capturing:
    • [^;\\]* - 0+ chars other than ; and \
    • (?:\\.[^;\\]*)* - 0+ sequences of:
      • \\. - any escaped char (add ~s modifier to allow . to match linebreaks, too)
      • [^;\\]*- 0+ chars other than ; and \
  • ; - a semi-colon

Upvotes: 2

Related Questions