Rippo
Rippo

Reputation: 22424

Unicode issue with insert ZERO WIDTH SPACE into database

I am using CKEditor and it seems that it is possible with the correct keypresses to get the following unicode character inserted into the textarea.

U+200B ​ \xe2\x80\x8b ZERO WIDTH SPACE

Now when I try to save this into a MySQL database I get the following error:-

MySql.Data.MySqlClient.MySqlException
Incorrect string value: '\xE2\x80\x8B </...' for column 'Content' at row 1

From what I can see I have a several options:-

  1. Change the collation on my table, however I am not entirely sure what impact this will have on my c# MVC4 application that uses NHibernate as the ORM
  2. Strip out the unicode from the string before I insert into the database, however I am not entirely how to do this and even if it is correct.
  3. This seems to be a bug in CKEditor for certain browsers, however I would like to future proof myself by not waiting for a fix.

So my question is simply what is my best option to get around this issue?

Table structure

Upvotes: 1

Views: 3498

Answers (1)

Sylvain Leroux
Sylvain Leroux

Reputation: 52000

Visibly your charset is Latin1.

You shouldn't try to store unicode data in Latin1 column. You will probably have to change that:

ALTER TABLE campaignemail MODIFY Content LONGTEXT CHARACTER SET utf8

Beware when doing so that if you erroneously stored "unicode-pretending-to-be-latin1" this might put a mess in your table values.


BTW the charset is the encoding used to map from a "letter" (strictly speaking: a codepoint) to "bytes".

The collation define the relative order between the various "letters". If is used to search/sort columns.

Upvotes: 1

Related Questions