Barry Fruitman
Barry Fruitman

Reputation: 12656

How do I import this file into MySQL?

I'm trying to import this Hungarian text file into a MySQL table but the Hungarian characters are always corrupted. I've tried many encodings for both the import file and table haven't found the right combination.

The file format is one word and one number per line, separated by a space. My import table has two columns, varchar and integer. I'm using MySQL 5.5.16 and phpMyAdmin 3.4.5. phpMyAdmin solution is preferred but I can use the command line if necessary.

Thanks in advance!

EDIT: Broken link above fixed

Upvotes: 0

Views: 1938

Answers (2)

eggyal
eggyal

Reputation: 125835

Your file appears to be encoded in UTF-8. For example:

$ unzip -p hu_50K.zip | sed -n 59p | xxd
0000000: 6bc3 b673 7ac3 b66e c3b6 6d20 3532 3030  k..sz..n..m 5200
0000010: 310d 0a                                  1..

I understand that "köszönöm" is Hungarian for "thank you". If that is what row 59 of the file is supposed to contain, then the ö character (U+00F6) is encoded as 0xc3b6, which is UTF-8.

To import this file using LOAD DATA INFILE:

LOAD DATA [LOCAL] INFILE '/path/to/hu_50K.txt'
    INTO TABLE my_table
    CHARACTER SET utf8
    FIELDS
        TERMINATED BY ' '
    LINES
        TERMINATED BY '\r\n'
    (col_word, col_number)

Of course, col_word must be able to hold the characters - which it necessarily will if it is also encoded in UTF-8.

Upvotes: 1

Travis G
Travis G

Reputation: 1602

try this solution to convert file to utf-8

import codecs 
import commands 
f_loc = "my.file"
f_enc = commands.getoutput('file -b --mime-encoding %s' % f_loc)
 f_stream = codecs.open(f_loc, 'r', f_enc) 
f_out = codecs.open(f_loc+"b", 'w', 'utf-8') 
for l in f_stream: 
     f_output.write(l) 
 f_stream.close() 
 f_out.close()

once this is done you can use this file to load into mysql using load data infile .

Also ensure that CHARACTER SET clause of the LOAD DATA INFILE is set to The encoding of the file as it can override  character_set_database system variable

Upvotes: 0

Related Questions