kratos
kratos

Reputation: 2495

Mongo DB and Names with special characters

I am processing a large text file that has names with an occasional special character. An example would be "CASTA¥EDA, JASON". When I process the file the line comes across as UTF-8. However when the insert to Mongo is about to happen, it shows error:

[MongoDB\Driver\Exception\UnexpectedValueException]
  Got invalid UTF-8 value serializing 'Jason Casta�eda'

I then proceeded to do this:

  $name = iconv("UTF-8","UTF-8//IGNORE",$name);

And now this produces: Jason Castaeda

Is there a way to find if a name has special characters that are non-utf-8.

Ideally it would be nice to know if a line of file has characters that will not make the cut to Mongo. Any tips?

I mean I could take the length of the name before and then do an iconv and compare the string lengths but that seems trivial. Any better way?

Upvotes: 0

Views: 725

Answers (1)

Tom M
Tom M

Reputation: 2894

I'd recommend converting all user-input strings into a Buffer. Or check for insert/update errors by using ACKNOWLEDGED

EDIT: Sorry about that, totally ignored the php tag. try this:

function bin2text($your_binary_number) { 
    $text_str = ''; 
    $chars = explode("\n", chunk_split(str_replace("\n", '', $bin_str), 8)); 
    $_i = count($chars); 
    for ($i = 0; $i < $_i; $text_str .= chr(bindec($chars[$i])), $i  ); 
    return $text_str; 
} 

function text2bin($txt_str) { 
$len = strlen($txt_str); 
$bin = ''; 
    for($i = 0; $i < $len; $i  ) 
    { 
        $bin .= strlen(decbin(ord($txt_str[$i]))) < 8 ? str_pad(decbin(ord($txt_str[$i])), 8, 0, str_pad_left) : decbin(ord($txt_str[$i])); 
    } 
    return $bin; 
}  

taken from: http://psoug.org/snippet/PHP-Binary-to-Text-Text-to-Binary_380.htm

text2bin basically converts your string into binaries and bin2text() converts the binaries back into text

Upvotes: 1

Related Questions