Reputation: 513
I have read a couple other similar questions on here about this but didnt find what I was looking for. I am wondering what is the simplest way to ensure that no matter what, the text input inside of a form is unicode. I am using django and alot of front-end javascript which seems to me the best way to do this. I could do this myself but I am afraid that the way I plan on doing it is not the best way possible.
Upvotes: 1
Views: 353
Reputation: 536349
The content of web browser form fields in natively Unicode; there is nothing you could put in a form that would not be Unicode.
There are some checks you might want to do to ensure that you don't have control characters, explicit non-characters, characters denoted by Unicode/W3 as “unsuitable for use in markup” or invalid use of surrogates, but those are checks you'd have to do on the server side. You have to do validity-checking on the server side anyway; there is no benefit to checking for these problems on the client side as well as these are not generally things the average user would be able to type by accident.
As for server-side checking that the stream of bytes submitted for the form is converted into a Unicode string in the proper way, that would be up to your framework. eg Django does it with the DEFAULT_CHARSET (usually UTF-8).
Upvotes: 2