Reputation: 386
I have two string variables - first variable is set manually inside code ($date1="14 июня"
), second one parsed from remote page using cURL and phpQuery.
If we print both variables, the result looks the same, but length and content are different:
echo $date1; //output: 14 июня
echo $date2; //output: 14 июня
echo $date1[2]; //output is space - third symbol in string
echo $date2[2]; //output is � - it's a part of third symbol in string
echo strlen($date1); //output: 7
echo strlen($date2); //output: 12
echo mb_detect_encoding($date1) //output: UTF-8
echo mb_detect_encoding($date2) //output: UTF-8
I wonder if there is a solution how to convert $date2
to format/encoding of $date1
?
p.s: There is SO topic about iconv(), but I'm unable to find working solution.
Upvotes: 0
Views: 1590
Reputation: 255005
So you have 2 strings:
313420d0b8d18ed0bdd18f
- this uses 0x20
character as a space.
3134c2a0d0b8d18ed0bdd18f
- this uses the 0xC2A0
sequence of bytes as a space (it's the Unicode's non-breaking space).
Apart of those spaces the strings are identical.
To replace the space-alike unicode characters with a regular space you can use the following regular expression:
preg_replace('~\p{Zs}~u', ' ', $str)
References:
Upvotes: 3