Reputation: 95
I have a table with the following columns:
users(
id SERIAL,
username VARCHAR(20),
password VARCHAR(64),
salt VARCHAR(32),
name VARCHAR(50),
joined TIMESTAMP WITHOUT TIME ZONE,
grupo INTEGER
)
The database encoding is UTF8.
Pdo connection:
private function __construct(){
try{
$this->_pdo = new PDO('pgsql:host=' . Config::get('pgsql/host') . ';port=' . Config::get('pgsql/port') . ';dbname=' . Config::get('pgsql/db') . ';user=' . Config::get('pgsql/username'). ';password=' . Config::get('pgsql/password'));
$this->_pdo->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
}catch(PDOException $e){
die($e->getMessage());
}
}
I'm using PDO, and my script is the following, to insert data:
public function query($sql, $params = array()){
$this->_error = false;
if($this->_query = $this->_pdo->prepare($sql)){
$this->_query->execute($params);
}
}
The sql passed is the following:
INSERT INTO users(username, password, salt, name, joined, grupo) VALUES(?, ?, ?, ?, ?, ?)
And the array passed is:
Array ( [0] => nath
[1] => 81033b63c09fd9104977fdb0ef70b5dc627fd9a6e90d0d400706603def8c22a6
[2] => KwjWC57AO0Gh1VvSUuJpDMNkEiraBzFL
[3] => Nathália
[4] => 2014-03-06 19:35:01
[5] => 1 )
When I run this, I get the following error:
SQLSTATE[22021]: Character not in repertoire: 7 ERRO: invalid byte sequence invalid for UTF encode. "UTF8": 0xe1 0x6c 0x69
PS: If I type Nathalia instead of Nathália it works perfectly.
Trying to figure out what was going on, I inserted field by field, like this:
if($this->_query = $this->_pdo->prepare("INSERT INTO users(username) VALUES(?)"){
$this->_query->execute(array('nath'));
}
And it worked ok. Then I replaced username
with password
and array('nath')
with array('81033b63c09fd9104977fdb0ef70b5dc627fd9a6e90d0d400706603def8c22a6')
and the same for the other fields.
Everything worked perfectly when I inserted field by field. Any clues of what is going on?
Upvotes: 1
Views: 1892
Reputation: 61546
The error message is specific about the problem:
0xe1 0x6c 0x69
0xe1 is á
in iso-8859-1, not in utf-8.
The other two bytes represent characters in the US-ASCII range (l
and i
) so they share the same byte-representation in iso-8859-1 and utf-8.
Your script is sending iso-8859-1
encoded text instead of utf-8
encoded text. You should question where you get the Nathália
string from, and how it's supposed to be encoded.
If it's supposed to be in utf-8
then it's a bug from the producer of that string. If it's supposed to be in ISO-latin, then your script must apply utf8_encode
to it before feeding it to the utf-8
database connection.
Upvotes: 2