fugue
fugue

Reputation: 1

How can I convert corrupted old data or broken charset in database to UTF-8?

I have a database, but the data in it is not UTF-8 compatible and shows Turkish characters broken. I need a detailed help on how to translate data automatically or manually. I've tried everything, but can't figure out how to replace broken characters with compatible ones.

post subjects looks like this

There are too many symbols that I can't replace.

Upvotes: -3

Views: 264

Answers (1)

user1191247
user1191247

Reputation: 12973

You haven't given us much information to work from, but assuming you have been inserting UTF-8 data into latin1 columns over a latin1 connection you can do the following.

Given a posts table like:

CREATE TABLE posts (
    id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
    subject VARCHAR(255) NOT NULL,
    body TEXT NOT NULL
) DEFAULT CHARSET=latin1;

You can create a new table like:

CREATE TABLE posts_utf8 (
    id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
    subject VARCHAR(255) NOT NULL,
    body TEXT NOT NULL
) DEFAULT CHARSET=utf8mb4;

And then populate it with:

INSERT INTO posts_utf8
SELECT id,
    CONVERT(BINARY subject USING utf8mb4),
    CONVERT(BINARY body USING utf8mb4)
FROM posts;

If you are happy with the content of the new table you can then switch them over:

RENAME TABLE posts TO posts_old, posts_utf8 TO posts;

Upvotes: 0

Related Questions