Reputation: 51
I have a problem with MySql I have a table with parsed informations from websites. A strange string interpretation appear:
the query
select id, address from pagesjaunes_test where address = substr(address,1,length(address)-1)
return a set of values instead of none
at beginning I executed functions as:
address = replace(address, '\n', '')
address = replace(address, '\t', '')
address = replace(address, '\r', '')
address = replace(address, '\r\n', '')
address = trim(address)
but the problem still persist.
Values of field 'address' have some french chars , but the query returned also values that contains only alfanumeric english chars.
Another test: I tried to check the length of strings and ... the strlen() from PHP and LENGTH() from MYSQL display different results! Somewhere difference is by 2 chars, somewhere by 1 character without a specific "rule".
Visual I can't see any space or tabs or something else.
After I modified an address manualy(I deleted all string and I wrote it again), the problem is solved, but I have ~ 6000 values, so this is not a solution :)
What can be the problem?
I suppose that strings can have something as an "empty char", but how to detect and remove it?
Thanks
P.S. the problem is not just length. I need to join this table with other one and using a condition that check if values from fields 'address' are equals. Even if the fields have the same collation and tables have the same collation, query returns that no addresses match
E.g.
For query:
SELECT p.address,char_length(p.address) , r.address, char_length(r.address)
FROM `pagesjaunes_test` p
LEFT JOIN restaurants r on p.name=r.name
WHERE
p.postal_code=r.postal_code
and p.address!=r.address
and p.phone=''
and p.cuisines=''
LIMIT 10
So: p.address!=r.address
The result is:
+-------------------------------------+------------------------+--------------------------+------------------------+ | address | char_length(p.address) | address | char_length(r.address) | +-------------------------------------+------------------------+--------------------------+------------------------+ | Dupin Marc13 quai Grands Augustins | 34 | 13 quai Grands Augustins | 24 | | 39 r Montpensier | 16 | 39 r Montpensier | 16 | | 8 r Lord Byron | 14 | 3 r Balzac | 10 | | 162 r Vaugirard | 15 | 162 r Vaugirard | 15 | | 32 r Goutte d'Or | 16 | 32 r Goutte d'Or | 16 | | 2 r Casimir Périer | 18 | 2 r Casimir Périer | 18 | | 20 r Saussier Leroy | 19 | 20 r Saussier Leroy | 19 | | Senes Douglas22 r Greneta | 25 | 22 r Greneta | 12 | | Ngov Ly Mey44 r Tolbiac | 23 | 44 r Tolbiac | 12 | | 33 r N-D de Nazareth | 20 | 33 r N-D de Nazareth | 20 | +-------------------------------------+------------------------+--------------------------+------------------------+
As you see, "162 r Vaugirard", "20 r Saussier Leroy" contains only ASCII chars, have the same length but aren't equals!
Upvotes: 0
Views: 802
Reputation: 51
Finally, I found the problem. After changed collation to ascii_general_ci all non-ascii chars was transformed to "?". Some spaces also was replaced with "?". After check initial values, function ORD() from MySQL returned 160 (instead of 32) for these spaces. So,
UPDATE pagesjaunes_test SET address = TRIM(REPLACE(REPLACE(address, CHAR(160), ' '), ' ',' ')
resolved my question.
Upvotes: 0
Reputation: 43474
The official documentation says:
Returns the length of the string str, measured in bytes. A multi-byte character counts as multiple bytes. This means that for a string containing five two-byte characters, LENGTH() returns 10, whereas CHAR_LENGTH() returns 5.
So, use CHAR_LENGTH
instead :)
select id, address from pagesjaunes_test
where address = substr(address, 1, char_length(address) - 1)
Upvotes: 1
Reputation: 35
Maybe have a look at the encoding of the mysql text fields - UTF8 encodes most of its characters with 2 bytes - only a small subset of UTF8 (ASCII characters for example) get encoded with one byte.
MySQL knows UTF8 and counts right. PHP text functions aren't UTF8 aware and count the bytes itself.
So if PHP counts more than MYSQL, this is probably the cause and you could have a look at utf8decode.
br from Salzburg!
Upvotes: 3