Haradzieniec
Haradzieniec

Reputation: 9340

php preg_split and UTF-8 symbols

Could anybody explain, why this code

$string='6аd_ТЕХТ GOOD_TEXT';
$words = preg_split('/\s+/', $string, NULL, PREG_SPLIT_NO_EMPTY);

var_dump($words);

displays

array(2) { [0]=> string(8) "6àd_ÒÅÕÒ" [1]=> string(9) "GOOD_TEXT" }

instead of

array(2) { [0]=> string(8) "6аd_ТЕХТ" [1]=> string(9) "GOOD_TEXT" }

I've read about this issue, but adding /u :

preg_split('/\s+/', $string, NULL, PREG_SPLIT_NO_EMPTY);// '/\s+/'

to become

preg_split('/\s+/u', $string, NULL, PREG_SPLIT_NO_EMPTY);// '/\s+/u'

doesn't help. How to fix this issue?

Thank you.

Upvotes: 1

Views: 1425

Answers (2)

craniumonempty
craniumonempty

Reputation: 3535

... I said it was the slash, but apparently it was the utf-8 stuff that made it work.

EDIT: I removed the rest and found that all I needed was the xml line to make it work in the browser.

<?php
ini_set('default_charset','utf-8');
header('Content-type: text/html; charset=utf-8');

echo '<?xml version="1.0" encoding="UTF-8"?'.'>
<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
</head><body><pre>
';

$string = "6аd_ТЕХТ GOOD_TEXT";
var_dump(preg_split('/\s+/u', $string, NULL, PREG_SPLIT_NO_EMPTY));

echo '</pre></body></html>';

This is the output:

array(2) {
  [0]=>
  string(13) "6аd_ТЕХТ"
  [1]=>
  string(9) "GOOD_TEXT"
}

Upvotes: 0

0b10011
0b10011

Reputation: 18795

There is something else happening in your code that isn't present in the provided example. Tested the provided example and it works as expected. On the off-chance that this is really happening (and there is no other code affecting $string), this may be a bug with the specific PHP version you're using and can be solved by upgrading PHP (but it's highly unlikely that it's an issue with PHP).

Upvotes: 1

Related Questions