Fabbio
Fabbio

Reputation: 363

PHP multibyte safe preg_replace Vs. str_replace

Good day!

I am having some troubles with preg_replace and utf-8 characters. The following code-fragment:

$v = "line1\nline2\r\nмы хотели бы поблагодарить";
print $v;
print preg_replace("#\R#", "", $v);
print preg_replace("\n", "", $v);

returns the following output:

line1
line2
мы хотели бы поблагодарить

line1line2мы �отели бы поблагодарить

line1line2
мы хотели бы поблагодарить Вас

For some reason the х is unreadable when \R is used but it is unaffected when \n is used. As \R is PHP specific I suppose this generates the problem. Does anybody have a clue about how I could use \R (which is not accepted by str_replace) in preg_replace? I fear this problem might be happening in many other cases, not only with capital chi.

Upvotes: 3

Views: 434

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626845

Since you have a Unicode input, you must pass /u flag to the regex to deal with the input correctly:

$v = "line1\nline2\r\nмы хотели бы поблагодарить";
echo preg_replace('/\R/u', "", $v);
// => line1line2мы хотели бы поблагодарить

See IDEONE demo

This /u flag is required when both pattern and input can contain Unicode string literals.

Upvotes: 5

Related Questions