Reputation: 738
I am grabbing input from a file with the following code
$jap= str_replace("\n","",addslashes(strtolower(trim(fgets($fh), " \t\n\r"))));
i had also previously tried these while troubleshooting
$jap= str_replace("\n","",addslashes(strtolower(trim(fgets($fh)))));
$jap= addslashes(strtolower(trim(fgets($fh), " \t\n\r")));
and if I echo $jap it looks fine, so later in the code, without any other alterations to $jap it is inserted into the DB, however i noticed a comparison test that checks if this jap is already in the DB returned false when i can plainly see that a seemingly exact same entry of jap is in the DB. So I copy the jap entry that was inserted right from phpmyadmin or from my site where the jap is displayed and paste into a notepad i notice that it paste like this... (this is an exact paste into the below quotes)
"
バスにのって、うみへ行きました"
and obviously i need, it without that white space and breaks or whatever it is.
so as far as I can tell the trim is not doing what it says it will do. or im missing something here. if so what is it?
UPDATE: with regards to Jacks answer
the preg_replace did not help but here is what i did, i used the bin2hex() to determine that the part that "is not the part i want" is efbbbf i did this by taking $jap into str replace and removing the japanese i am expecting to find, and what is left goes into the bin2hex. and the result was the above "efbbbf"
echo bin2hex(str_replace("どちらがあなたの本ですか","",$jap));
output of the above was efbbbf but what is it? can i make a str_replace to remove this somehow?
Upvotes: 18
Views: 19634
Reputation: 173532
The trim
function doesn't know about Unicode white spaces. You could try this:
preg_replace('/^\p{Z}+|\p{Z}+$/u', '', $str);
As taken from: Trim unicode whitespace in PHP 5.2
Otherwise, you can do a bin2hex()
to find out what characters are being added at the front.
Update
Your file contains a UTF8 BOM; to remove it:
$f = fopen("file.txt", "r");
$s = fread($f, 3);
if ($s !== "\xef\xbb\xbf") {
// bom not found, rewind file
fseek($f, 0, SEEK_SET);
}
// continue reading here
Upvotes: 45