Reputation: 610
In my webpage, there is a form with multiple inputs. However, the input chars behave differently from the input "label" chars. I tried setting the file encoding to UTF-8 and UTF-8 +BOM (I'm using EditPlus).
Using UTF-8:
Using UTF-8 + BOM:
The input chars come from a mysql database where the collation is utf8_unicode_ci (using phpmyadmin) so i don't know if that's the problem's source. Any ideas?
Upvotes: 0
Views: 508
Reputation: 610
solved it: Just changed the file enconding to "Western European (Windows) 1252" (using EditPlus) and now every character is correctly shown.
Upvotes: 0
Reputation: 522042
This means both pieces of data are not in the same encoding. If the file is interpreted as Latin-1 (or a similar encoding), you get the first result in which the data in the input field is valid (meaning it's Latin-1 encoded) but the label is wrong (meaning it's not Latin-1 encoded). When the file is interpreted as UTF-8, the label is correct (meaning it's UTF-8 encoded) but the data in the input field is wrong (meaning it's not UTF-8 encoded). If data shows up as the � UNICODE REPLACEMENT CHARACTER, it's a sure sign the document is being interpreted as a Unicode encoding (e.g. UTF-8), but the byte sequence is invalid.
I'll guess that the label is hardcoded in the file but the data in the input field comes from a database. In this case you need to set the connection encoding for the database to return UTF-8.
As to why the file is interpreted in Latin-1 without BOM and in UTF-8 with BOM: because the browser recognizes the BOM as signifying UTF-8, without it it defaults to Latin-1. You need to set the correct HTTP header to tell the browser what encoding the file is in, and get rid of the BOM.
Read these resources:
Upvotes: 1