Reputation: 2879
When storing data in mysql using the UTF8 charset, does it make sense to escape entity characters when the data is being input or is it better to store it in raw form and transform it when pulling out?
For instance, let's say someone enters a bullet (•) character into a text box. When saving that data, should it be converted to •
before being input? Or would it make sense to enter it as a bullet, then convert when pulling out?
I guess I'm just not sure on the best practices for storing non-ascii data. Any thoughts would be greatly appreciated.
Upvotes: 1
Views: 4045
Reputation: 3892
Consider that the database can host data for multiple applications.
In that environment, the definition of a string in the database is defined by the database, not the application. Make your application conform to the data standards and make the conversions explicit in your data layer.
For example, if the database is a newer schema and the DBA has defined that strings will be stored in UTF-8, then all strings passed from your application should be UTF-8.
If, however, the database is a legacy system and the target for your data is an 8 bit character set, then do the conversion in your application to the appropriate code page and/or fail when you encounter a non-conforming value.
Most newer database schemas that interact with the web should standardise on UTF-8 or UTF-16. If you are building the database, start with localising it first and then, once you've decided on the internal string representations, force all the applications that write to it to conform to your standards.
Upvotes: 0
Reputation: 401182
If you are using the UTF-8 charset for your whole application (i.e. MySQL, but also the encoding of your HTML pages, your scripts, code, and all that), there is no need to tranform "special characters" into entities : just send your text data as UTF-8 too ;-)
Upvotes: 6
Reputation: 105914
Store the data as-is. Perform any conversions necessary for display at run-time.
Because if you store it as HTML (with entities) you create several issues
varchar(255)
or usage of SQL string functions like substring()
or reverse()
)Upvotes: 3
Reputation: 24587
The purpose of escaping is to transmit data over a channel that does not allow certain characters. Since an UTF-8 database can handle UTF-8 characters just fine, you have no reason to escape anything for storage. In fact, since escaped text is harder to manipulate (string functions will not work properly, for instance), it is usually advised not to perform an unnecessary escaping.
Upvotes: 0