DDDD
DDDD

Reputation: 3940

Form saves special latin characters as symbols

My PHP form is submitting special latin characters as symbols.

So, Québec turns into Québec

My form is set to UTF-8 and my database table has latin1_swedish_ci collation.

PHP: $db = new PDO('mysql:host=localhost;dbname=x;charset=utf8', 'x', 'x');

A bindParam: $sql->bindParam(":x", $_POST['x'],PDO::PARAM_STR);

I am new to PDO so I am not sure what the problem is. Thank you

*I am using phpMyAdmin

Upvotes: 1

Views: 842

Answers (4)

KathyA.
KathyA.

Reputation: 681

To expand a little bit more on the encoding problem...

Any time you see one character in a source turn into two (or more characters), you should immediately suspect an encoding issue, especially if UTF-8 is involved. Here's why. (I apologize if you already know some of this, but I hope to help some future SO'ers as well.)

All characters are stored in your computer not as characters, but as bytes. Back in the olden days, space and transmission time were much more limited than now, so people tried to save every byte possible, even down to not using a full byte to store a character. Now, because we realize that we need to communicate with the whole world, we've decided it's more important to be able to represent every character in every language. That transition hasn't always been smooth, and that's what you're running up against.

Latin-1 (in various flavors) is an encoding that always uses a single 8-bit byte for a character. Which means it can only have 256 possible characters. Plenty if you only want to write English or Swedish, but not enough to add Russian and Chinese. (background on Latin-1)

UTF-8 encodes the first half of Latin-1 in exactly the same way, which is why you see most of the characters looking the same. But it doesn't always use a single byte for a character -- it can use up to four bytes on one character. (utf-8) As you discovered, it uses 2 bytes for é. But Latin-1 doesn't know that, and is doing its best to display those two bytes.

The trick is to always specify your encoding for byte streams (like info from a file, a URL, or a database), and to make sure that encoding is correct. (Sometimes that's a pain to find out, for sure.) Most modern languages, like Java and PHP do a good job of handling all the translation issues between different encodings, as long as you've correctly specified what you're dealing with.

Upvotes: 1

Ken Keenan
Ken Keenan

Reputation: 10568

You've pretty much answered your own question: you're receiving UTF-8 from the form but trying to store it in a Latin-1 column. You can either change the encoding on the column in MySQL or use the iconv function to translate between the two encodings.

Upvotes: 0

Suvash sarker
Suvash sarker

Reputation: 3170

Make sure you are saving the file with UTF-8 encoding (this is often overlooked)

Set headers:

<?php header("Content-type: text/html; charset=utf-8"); ?>

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

Upvotes: 0

Kristiyan
Kristiyan

Reputation: 1663

Change your database table and column to utf8_unicode_ci.

Upvotes: 0

Related Questions