2ge
2ge

Reputation: 281

PHP: preg_replace eats all memory

I am processing couple of GB of text, and my script dies on preg_replace(). After some research I extract the problematic part of the text, where the leak appears.

preg_replace('/\b\p{L}{0,2}\b/u', '', "\x65\xe2\xba\xb7\x69\xe3\xb1\xae"); 

PHP Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 251105872 bytes)

I am trying to delete short (up to 2 chars) words. Also I found out, if I change regexp to:

preg_replace('/\b\p{L}{1,2}\b/u', '', "\x65\xe2\xba\xb7\x69\xe3\xb1\xae"); 

it works just OK.

Somebody can explain whats going on please? 1st example works on 99% texts.

Upvotes: 3

Views: 2245

Answers (1)

Jerry
Jerry

Reputation: 71578

\b\p{L}{0,2}\b
        ^

This 0 here will make the regex match in more places than you need and you get possibly twice or more to match and replace.

E.g: You get 344 matches with a "Lorem ipsum" text with \b\p{L}{0,2}\b (regex101 demo) but only 19 with \b\p{L}{1,2}\b (regex101 demo).

And if it's a replace, you get so many more to do!

Upvotes: 1

Related Questions