Reputation: 439
I have some utf-8 strings in my database, they are stored as varbinary. (Generally, it's mediawiki database, but that's not important, i think). I found that some strings are not in a good shape, then i make
SELECT log_comment, CONVERT( log_comment
USING utf8 ) AS
COMMENT
FROM `logging`
WHERE log_id = %somevalue%
i have output table in phpmyadmin like this:
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| d093d09ed0a1d0a220d0a020d098d0a1d09e2fd09cd0add09a20393239342d39332e20c2abd098d0bdd184d0bed180d0bcd0b0d186d0b8d0bed0bdd0bdd0b0d18f20d182d0b5d185d0bdd0bed0bbd0bed0b3d0b8d18f2e2e2e |NULL |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
What i need is to make this string readible, or upload new string with correct data. But this is varbinary field, how can i manage data inside it?
UPD: found that phpmyadmin automatically added 2e2e2e for three dots at the end of each line - they were too long to show. Original binary data are, if somebody interested,
d09fd0a02035302e312e3031392d3230303020d09ed181d0bdd0bed0b2d0bdd18bd0b520d0bfd0bed0bbd0bed0b6d0b5d0bdd0b8d18f20d0b5d0b4d0b8d0bdd0bed0b920d181d0b8d181d182d0b5d0bcd18b20d0bad0bbd0b0d181d181d0b8d184d0b8d0bad0b0d186d0b8d0b820d0b820d0bad0bed0b4d0b8d180d0bed0b2d0b0d0bdd0b8d18f20d182d0b5d185d0bdd0b8d0bad0be2dd18dd0bad0bed0bdd0bed0bcd0b8d187d0b5d181d0bad0bed0b920d0b820d181d0bed186d0b8d0b0d0bbd18cd0bdd0bed0b920d0b8d0bdd184d0bed180d0bcd0b0d186d0b8d0b820d0b820d183d0bdd0b8d184d0b8d186d0b8d180d0bed0b2d0b0d0bdd0bdd18bd1
anyway those strings contains non-utf symbols at the line end, as it seems from
SELECT log_comment,CAST(log_comment AS CHAR CHARACTER SET utf8) AS COMMENT
FROM `logging`
WHERE log_id = %somevalue%
because last symbol is � - for me it seems as black rhomb with white question in it, and last 20-30 characters are missing
Upvotes: 1
Views: 2146
Reputation: 439
As it was said in Joni's comment,
"The length of the text is exactly 255 bytes, which is the limit of a MySQL tinytext/tinyblob field, and also often used by programmers as the size for varchar/varbinary. It looks like your original data has been clipped. The last D1 in your original data starts a new UTF-8 character, but the second byte is missing; that's why the last character is broken in the converted text."
In the MediaWiki DB in the field [log_comment] of the table [logging] should be stored headers of pages that was altered. Some of them appeared to be longer than 255 symbols, so while being logged they were clipped. That confused me; I thought that there was kind of database error, so i should just alter those strings - add to them missing symbols. Now i see it is slightly possible, so i just can gather necessary information from other fields.
Upvotes: 1
Reputation: 2246
SELECT log_comment,CAST(log_comment AS CHAR CHARACTER SET utf8) AS COMMENT
FROM `logging`
WHERE log_id = %somevalue%
Upvotes: 2
Reputation: 13506
try this:
SELECT log_comment,
CONVERT(log_comment,VARCHAR(65535)) AS COMMENT
FROM `logging`
WHERE log_id = %somevalue%
Upvotes: 0