xuanji
xuanji

Reputation: 5097

Why do some versions of php seem to treat unicode as unprintable

I ran the following code sample on 3v4l.org at https://3v4l.org/bUqlj

<?php

var_dump(preg_replace('/[^[:print:]]/u', 'x', "你"));

and it seems that some old versions of php are returning "x". This seems to be incorrect behaviour. I tried to see if this behaviour was documented somewhere online, but couldn't find it.

Upvotes: 0

Views: 100

Answers (1)

nj_
nj_

Reputation: 2339

This appears to be a result of a PCRE issue. The output on 3V4L shows that things started working in the v5.4 series after v5.4.41, and in the v5.5 series after v5.5.10.

Now, looking at the PHP changelogs:

So the upgrade away from PCRE v8.32 fixed the issue (note that the v5.6 series started with PCRE v8.34 in v5.6.0). Looking at the PCRE changelog, we see under the Version 8.34 15-December-2013 section, item 31:

Upgraded the handling of the POSIX classes [:graph:], [:print:], and [:punct:] when PCRE_UCP is set so as to include the same characters as Perl does in Unicode mode.

This looks to be the change that fixed your test case.

Upvotes: 1

Related Questions