chubbyk
chubbyk

Reputation: 6302

How to detect rows with chinese characters in MySQL?

How can I detect and delete rows with Chinese characters in MySQL?

Upvotes: 7

Views: 8082

Answers (3)

Sheldon
Sheldon

Reputation: 541

Here is the Table "Chinese_Test" Contains the Chinese Character on my PhpMyAdmin
Data:

enter image description here

Structure
enter image description here

notice my type of Collation is utf8, thus let's take a look at the Chinese Characters in utf8 table. http://www.ansell-uebersetzungen.com/gbuni.html

Notice the Chinese Character is from E4 to E9, hence we use the code

select number 
from Chinese_Test
where HEX(contents) REGEXP '^(..)*(E[4-9])';

and here is the result:

enter image description here

Upvotes: 13

Pekka
Pekka

Reputation: 449783

I don't have an answer, but to provide you with a starting point: Chinese characters will occupy certain blocks in the UTF-8 character set. Example

You would have to query for rows that contain characters between the first and the last point of that block. I can't think of a way to automate this though (i.e. to query for characters inside a certain range without naming each character explicitly).

Another untested idea that comes to mind is using iconv() to convert the string to a specifically Chinese encoding, using //IGNORE, and seeing whether any data is left. If anything is left, the string may contain chinese characters.... although this would probably be disrupted by any numbers inside the string,

It's an interesting problem.

Upvotes: 0

hjpotter92
hjpotter92

Reputation: 80657

If all the other rows have alphanumeric values try the following:

DELETE FROM  tableName WHERE NOT columnToCheck REGEXP '[A-Za-z0-9.,-]';

Do check the results before deletion, using the following:

SELECT * FROM tableName WHERE NOT columnToCheck REGEXP '[A-Za-z0-9.,-]';

Upvotes: 0

Related Questions