Reputation: 602
I'm trying to evaluate a multiline RegExp with preg_match_all
.
Unfortunately there seems to be a character limit around 24,000 characters (24,577 to be specific).
Does anyone know how to get this to work?
Pseudo-code:
<?php
$data = 'TRACE: aaaa(24,577 characters)';
preg_match_all('/([A-Z]+): ((?:(?![A-Z]+:).)*)\n/s', $data, $matches);
var_dump($matches);
?>
Working example (with < 24,577 characters): https://3v4l.org/8iRCc
Example that's NOT working (with > 24,577 characters): https://3v4l.org/ceKn6
Upvotes: 1
Views: 173
Reputation: 163207
You might rewrite the pattern using a negated character class instead of the tempered greedy token approach with the negative lookahead:
([A-Z]+): ([^A-Z\r\n]*(?>(?:\r?\n|[A-Z](?![A-Z]*:))[^A-Z\r\n]*)*)\r?\n
([A-Z]+):
Capture group 1, match 1+ uppercase chars :
and a space(
Capture group 2
[^A-Z\r\n]*
Match 1+ times any char except A-Z or a newline(?>
Atomic group
(?:
Non capture group
\r?\n
Match a newline|
Or[A-Z]
Match a char other than A-Z(?![A-Z]*:)
Negative lookahead, assert not optional chars A-Z and :
)
Close non capture group[^A-Z\r\n]*
Optionally match any char except A-Z)*
Close atomic group and optionally repeat)\r?\n
Close group 2 and match a newlineIf the TRACE:
is at the start of the string, you can also add an anchor:
^([A-Z]+): ([^A-Z\r\n]*(?>(?:\r?\n|[A-Z](?![A-Z]*:))[^A-Z\r\n]*)*)\r?\n
Edit
If the strings start with the same format, you can capture and match all lines that do not start with the opening format.
^([A-Z]+): (.*(?:\r?\n(?![A-Z]+: ).*)*)
The pattern matches:
^
Start of string([A-Z]+):
Capture group 1(
Capture group 2
.*
Match the rest of the line(?:\r?\n(?![A-Z]+: ).*)*
Repeat matching all lines that do not start with the pattern [A-Z]+:
)
Close group 2In php you can use
$re = '/^([A-Z]+): (.*(?:\r?\n(?![A-Z]+: ).*)*)/m';
Upvotes: 1
Reputation:
Try this
preg_match('/\A(?>[^\r\n]*(?>\r\n?|\n)){0,4}[^\r\n]*\z/',$data)
Upvotes: 0