Daric
Daric

Reputation: 16789

Mysql Select with LIKE clause is not working Chinese characters

I have data stored in single column which are in English and Chinese.

the data is separated by the separators e.g. for Chinese

<!--:zh-->日本<!--:-->

for English

<!--:en-->English Characters<!--:-->

I would show the content according to users selected language.

I made a query like this

SELECT * FROM table WHERE content LIKE '<!--:zh-->%<!--:-->' 

The query above works but return empty result set.

Collation of content column is utf8_general_ci

I have also tried using the convert function like below

SELECT * FROM table WHERE CONVERT(content USING utf8) 
                           LIKE CONVERT('<!--:zh-->%<!--:-->' USING utf8)

But this also does not work.

I also tried running the query SET NAMES UTF8 but still it does not work.

I am running queries in PhpMyAdmin if it does matter.


qTranslate did not change the database used by WordPress. Translation data is stored in original fields. For that reason there is each field containing all translations for that special field and the data is like this

<!--:en-->English Characters<!--:--><!--:zh-->日本<!--:-->

http://wpml.org/documentation/related-projects/qtranslate-importer/

Upvotes: 10

Views: 3780

Answers (6)

Devart
Devart

Reputation: 122042

I have tried to reproduce the problem. The query is OK, I have got the result, even using SET NAMES latin1.

Check the content of the field, possible there are beginning/ending white spaces, remove them firstly, or try this query -

SELECT * FROM table
WHERE TRIM(content) LIKE '<!--:zh-->%<!--:-->'

Example with your string -

CREATE TABLE table1(
  column1 VARCHAR(255) CHARACTER SET utf8 COLLATE utf8_general_ci
);

INSERT INTO table1 VALUES
  ('<!--:en-->English Characters<!--:--><!--:zh-->日本<!--:-->');

SELECT * FROM table1 WHERE column1 LIKE '%<!--:zh-->%<!--:-->';

=>  <!--:en-->English Characters<!--:--><!--:zh-->日本<!--:-->

Upvotes: 2

Panda Pajama
Panda Pajama

Reputation: 1441

As others have pointed, your queries seem to be fine, so I'd look somewhere else. This is something you can try:

I'm not sure about chinese input, but for japanese, many symbols have full-width and half-width variants, for example: "hello" and "hello" look similar, but the codepoints of their characaters are different, and therefore won't compare as equal. It's very easy to mistype something in full-width, and very difficult to detect, especially for whitespace. Compare " " and " ".

You are probably storing your data in half width and querying it in full width. Even if one character is different (especially spaces are difficult to detect), the query will not find your desired data.

There are many ways to detect this, for instance try copying the data and query into text files verbatim, and view them with hex editors. If there is a single bit difference in the relevant parts, you may be dealing with this problem.

Upvotes: 3

Matzo
Matzo

Reputation: 94

Can I ask what version of MySQL you're using? From what I see your code seems fine, which gets me thinking you're not running the most up to date version of MySQL.

Upvotes: 0

Alexander Taver
Alexander Taver

Reputation: 474

Search fails because of the way you store data. You are using utf8_general_ci collation, which is tailored to fast search in some European languages. It is even not so perfect with some of them. People tend to use it just because it fast and they don't care about some search inaccuracy in, say, Scandinavian languages. Change this to big5_chinese_ci or some other Chinese - tuned collation.

UPD. Another thing. I see, you use a kind of markup in your DB records.

<!--:zh-->日本<!--:-->
<!--:en-->English Characters<!--:-->

So, if you're searching for Chinese, you may just use

SELECT * FROM table WHERE content LIKE '<!--:zh-->%' 

instead of

SELECT * FROM table WHERE content LIKE '<!--:zh-->%<!--:-->' 

Upvotes: 2

Sir Rufo
Sir Rufo

Reputation: 19116

Test table data for content

<!--:zh-->日本<!--:--><!--:en-->English Characters<!--:-->
<!--:en-->English Characters<!--:--><!--:zh-->日本<!--:-->
<!--:zh-->日本<!--:-->
<!--:en-->English Characters<!--:-->

followed by

I have data stored in single column which are in English and Chinese

and your select should look like this

SELECT * FROM tab
WHERE content LIKE '%<!--:zh-->%<!--:-->%'

SQL Fiddle DEMO (also with demo how to get the special language part out of content)

SET @PRE = '<!--:zh-->', @SUF = '<!--:-->';

SELECT
  content,
  SUBSTR( 
    content, 
    LOCATE( @PRE, content ) + LENGTH( @PRE ), 
    LOCATE( @SUF, content, LOCATE( @PRE, content ) ) - LOCATE( @PRE, content ) - LENGTH( @PRE ) 
  ) langcontent
FROM tab
WHERE content LIKE CONCAT( '%', @PRE, '%', @SUF, '%' );

as stated in MySQL Documentation and follow the example of

SELECT 'David!' LIKE '%D%v%';

Upvotes: 4

Anton
Anton

Reputation: 4018

Assuming you're using MySQL, you can use wildcards in LIKE:

  • % matches any number of characters, including zero characters.
  • _ matches exactly one character

Here's an example search for values containing the character 日 in the content column of your table:

SELECT * FROM table WHERE `content` LIKE '%日%'

Upvotes: 2

Related Questions