Simon
Simon

Reputation: 602

Maximum character length for PHP multiline regular expressions?

I'm trying to evaluate a multiline RegExp with preg_match_all. Unfortunately there seems to be a character limit around 24,000 characters (24,577 to be specific). Does anyone know how to get this to work?

Pseudo-code:

<?php
$data = 'TRACE: aaaa(24,577 characters)';
preg_match_all('/([A-Z]+): ((?:(?![A-Z]+:).)*)\n/s', $data, $matches);
var_dump($matches);
?>

Working example (with < 24,577 characters): https://3v4l.org/8iRCc

Example that's NOT working (with > 24,577 characters): https://3v4l.org/ceKn6

Upvotes: 1

Views: 173

Answers (2)

The fourth bird
The fourth bird

Reputation: 163207

You might rewrite the pattern using a negated character class instead of the tempered greedy token approach with the negative lookahead:

([A-Z]+): ([^A-Z\r\n]*(?>(?:\r?\n|[A-Z](?![A-Z]*:))[^A-Z\r\n]*)*)\r?\n
  • ([A-Z]+): Capture group 1, match 1+ uppercase chars : and a space
  • ( Capture group 2
    • [^A-Z\r\n]* Match 1+ times any char except A-Z or a newline
    • (?> Atomic group
      • (?: Non capture group
        • \r?\n Match a newline
        • | Or
        • [A-Z] Match a char other than A-Z
        • (?![A-Z]*:) Negative lookahead, assert not optional chars A-Z and :
      • ) Close non capture group
      • [^A-Z\r\n]* Optionally match any char except A-Z
    • )* Close atomic group and optionally repeat
  • )\r?\n Close group 2 and match a newline

Regex demo | Php demo

If the TRACE: is at the start of the string, you can also add an anchor:

^([A-Z]+): ([^A-Z\r\n]*(?>(?:\r?\n|[A-Z](?![A-Z]*:))[^A-Z\r\n]*)*)\r?\n

Regex demo

Edit

If the strings start with the same format, you can capture and match all lines that do not start with the opening format.

^([A-Z]+): (.*(?:\r?\n(?![A-Z]+: ).*)*)

The pattern matches:

  • ^ Start of string
  • ([A-Z]+): Capture group 1
  • ( Capture group 2
    • .* Match the rest of the line
    • (?:\r?\n(?![A-Z]+: ).*)* Repeat matching all lines that do not start with the pattern [A-Z]+:
  • ) Close group 2

Regex demo

In php you can use

$re = '/^([A-Z]+): (.*(?:\r?\n(?![A-Z]+: ).*)*)/m';

Php demo

Upvotes: 1

user9168908
user9168908

Reputation:

Try this

preg_match('/\A(?>[^\r\n]*(?>\r\n?|\n)){0,4}[^\r\n]*\z/',$data)

Upvotes: 0

Related Questions