Reputation: 2495
I am processing a large text file that has names with an occasional special character. An example would be "CASTA¥EDA, JASON". When I process the file the line comes across as UTF-8. However when the insert to Mongo is about to happen, it shows error:
[MongoDB\Driver\Exception\UnexpectedValueException]
Got invalid UTF-8 value serializing 'Jason Casta�eda'
I then proceeded to do this:
$name = iconv("UTF-8","UTF-8//IGNORE",$name);
And now this produces: Jason Castaeda
Is there a way to find if a name has special characters that are non-utf-8.
Ideally it would be nice to know if a line of file has characters that will not make the cut to Mongo. Any tips?
I mean I could take the length of the name before and then do an iconv and compare the string lengths but that seems trivial. Any better way?
Upvotes: 0
Views: 725
Reputation: 2894
I'd recommend converting all user-input strings into a Buffer.
Or check for insert/update errors by using ACKNOWLEDGED
EDIT: Sorry about that, totally ignored the php tag. try this:
function bin2text($your_binary_number) {
$text_str = '';
$chars = explode("\n", chunk_split(str_replace("\n", '', $bin_str), 8));
$_i = count($chars);
for ($i = 0; $i < $_i; $text_str .= chr(bindec($chars[$i])), $i );
return $text_str;
}
function text2bin($txt_str) {
$len = strlen($txt_str);
$bin = '';
for($i = 0; $i < $len; $i )
{
$bin .= strlen(decbin(ord($txt_str[$i]))) < 8 ? str_pad(decbin(ord($txt_str[$i])), 8, 0, str_pad_left) : decbin(ord($txt_str[$i]));
}
return $bin;
}
taken from: http://psoug.org/snippet/PHP-Binary-to-Text-Text-to-Binary_380.htm
text2bin
basically converts your string into binaries and bin2text()
converts the binaries back into text
Upvotes: 1