Olivier Langelaar
Olivier Langelaar

Reputation: 186

PHP MySQL utf-8 Euro symbol shown as questionmark on a diamond

So after a whole day of googling and debugging I end up here.

MySQL

set to the following encoding:

db:  utf8_general_ci
table: utf8_general_ci
column: utf8_general_ci, TEXT

I put in some euro symbols and some other weird characters

acentuação €€€€€

PHP (codeigniter)

config

$config['charset'] = 'UTF-8';

dsn

char_set=utf8,dbcollat=utf8_general_ci

I made some queries to compare

model

$query = $this->db->query("SET NAMES latin1");
$query = $this->db->query("SELECT shortdesc,HEX(shortdesc) FROM `contracttypes` WHERE id = 4");
$ret['latin1'] =  $query->row();
$query = $this->db->query("SET NAMES utf8");
$query = $this->db->query("SELECT shortdesc,HEX(shortdesc) FROM `contracttypes` WHERE id = 4");
$ret['utf8'] =  $query->row();
return $ret;;

controller

public function utfhell()  {
var_dump($this->campagne_model->utfhell());
}

This outputs

array (size=2)
'latin1' => 
object(stdClass)[34]
public 'shortdesc' => string 'acentua��o �����' (length=16)
public 'HEX(shortdesc)' => string '6163656E747561C3A7C3A36F20E282ACE282ACE282ACE282ACE282AC' (length=56)
'utf8' => 
object(stdClass)[33]
public 'shortdesc' => string 'acentuação €€€€€' (length=28)
public 'HEX(shortdesc)' => string '6163656E747561C3A7C3A36F20E282ACE282ACE282ACE282ACE282AC' (length=56)

So far so good, on to a

view

<?php header('Content-Type: text/html; charset="utf-8"', true); ?>
<!doctype html>
<html>
<head>
<title>UTFhell</title>
<link rel="stylesheet" href="../assets/css/style.css"/>
<meta charset="utf-8">
<meta http-equiv="Content-Type" content="text/html;charset=utf-8">
...
<?php
echo 'Original : ', $campagne_info->contractName->shortdesc."<br />";
echo 'UTF8 Encode : ', utf8_encode($campagne_info->contractName->shortdesc)."<br />";
echo 'UTF8 Decode : ', utf8_decode($campagne_info->contractName->shortdesc)."<br />";
echo 'TRANSLIT : ', iconv("ISO-8859-1", "UTF-8//TRANSLIT", $campagne_info->contractName->shortdesc)."<br />";
echo 'IGNORE TRANSLIT : ', iconv("ISO-8859-1", "UTF-8//IGNORE//TRANSLIT", $campagne_info->contractName->shortdesc)."<br />";
echo 'IGNORE   : ', iconv("ISO-8859-1", "UTF-8//IGNORE", $campagne_info->contractName->shortdesc)."<br />";
echo 'Plain: ', iconv("ISO-8859-1", "UTF-8", $campagne_info->contractName->shortdesc)."<br />";
echo '€€€€€€€€€€<br>'; 
?>

None of these now show me a normal euro symbol except the final echo statement, they all give me questionmark diamonds for the eurosymbols

Upvotes: 2

Views: 3461

Answers (1)

Rick James
Rick James

Reputation: 142560

The HEX is the utf8 encoding for that string. So the data is in the table 'correctly'.

The black diamond (�) is the browser's way of saying wtf. It comes from having latin1 characters, but telling the browser to display utf8 characters.

You could tell the browser to display "Western", that is avoiding the underlying problems. Remember, the goal is to really use utf8.

Sometimes this occurs together with Question Marks, in which case you must start over.

The cause (probably):

  1. The bytes you had were encoded latin1. You acquired them from somewhere -- file dump online input, etc.
  2. The connection parameters said latin1.
  3. The column/table is declared to be CHARACTER SET said utf8, so during INSERT, they were correctly converted.
  4. When SELECTing, the seting in step 2 was again latin1, so they were converted back to latin1.
  5. When displaying text in a web page, the page's header said that the bytes were utf8.

Solution, Plan A: (Sloppy, but probably workable)

Change #5 so say the appropriate equivalent of latin1.

Solution, Plan B:

  1. Fix the source to be utf8-encoded
  2. query("SET NAMES utf8") (unless there is a way to set it at connect time)
  3. Leave the table/column at CHARACTER SET utf8
  4. Step 2 cover this.
  5. Leave <meta ... UTF-*>.

Upvotes: 1

Related Questions