Reputation: 281
I am processing couple of GB of text, and my script dies on preg_replace(). After some research I extract the problematic part of the text, where the leak appears.
preg_replace('/\b\p{L}{0,2}\b/u', '', "\x65\xe2\xba\xb7\x69\xe3\xb1\xae");
PHP Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 251105872 bytes)
I am trying to delete short (up to 2 chars) words. Also I found out, if I change regexp to:
preg_replace('/\b\p{L}{1,2}\b/u', '', "\x65\xe2\xba\xb7\x69\xe3\xb1\xae");
it works just OK.
Somebody can explain whats going on please? 1st example works on 99% texts.
Upvotes: 3
Views: 2245
Reputation: 71578
\b\p{L}{0,2}\b
^
This 0 here will make the regex match in more places than you need and you get possibly twice or more to match and replace.
E.g: You get 344 matches with a "Lorem ipsum" text with \b\p{L}{0,2}\b
(regex101 demo) but only 19 with \b\p{L}{1,2}\b
(regex101 demo).
And if it's a replace, you get so many more to do!
Upvotes: 1