TheAptKid
TheAptKid

Reputation: 1571

Wrong character encoding

I have two forms on two different pages which are used to insert data to an MySQL database. I have some special character like 'čšžćđ' in my form data which I pass via the forms to the insertion scripts.

The data from the first form gets inserted correctly, while some fields from the second form contain the '?' characters, which would indicate a mismatch in encoding.

The two insertion scripts of both the forms are using the same file to connect to the database and set the encoding, like below:

<?php

$username = "root";
$password = "";
$servername = "localhost";

$conn = mysqli_connect($servername, $username, $password);
mysqli_select_db($conn, "testdb");

if (!$conn) {  // check if connected
    die("Connection failed: " . mysqli_connect_error());
    exit();
}else{

/* change character set to utf8 */
if (!mysqli_set_charset($conn, "utf8")) {
   // printf("Error loading character set utf8: %s\n", mysqli_error($conn));
} else {
   // printf("Current character set: %s\n", mysqli_character_set_name($conn));
}

mysqli_select_db($conn, "testdb");
//echo "Connected successfully.";


  // Check if the correct db is selected
  if ($result = mysqli_query($conn, "SELECT DATABASE()")) {
      $row = mysqli_fetch_row($result);
      //printf("\nDefault database is %s.\n", $row[0]);
      mysqli_free_result($result);
  }
}
?>

I guess this would mean, that the client character encoding isn't set correctly? All database tables have the utf_8 encoding set.

Upvotes: 0

Views: 2686

Answers (2)

Rick James
Rick James

Reputation: 142278

Are you talking about HTML forms? If so,

<form accept-charset="UTF-8">

Is it one ? per accented character? When trying to use utf8/utf8mb4, if you see Question Marks (regular ones, not black diamonds),

  • The bytes to be stored are not encoded as utf8. Fix this.
  • The column in the database is CHARACTER SET utf8 (or utf8mb4). Fix this.
  • Also, check that the connection during reading is utf8.

The data was probably converted to ?, hence cannot be recovered from the text.

SELECT col, HEX(col) FROM ... to see what got stored.

  • ? is 3F in hex.
  • Accented European letters will have two hex bytes per character. That includes each of čšžćđ.
  • Chinese, Japanese, and Korean will (mostly) have three hex bytes per character.
  • Four hex characters would indicate "double encoding".

Upvotes: 2

Marko Milivojevic
Marko Milivojevic

Reputation: 579

Try to set encoding on top of the page

<?php 

header('Content-Type: text/html; charset=utf-8');

other code...

Upvotes: 2

Related Questions