Mack
Mack

Reputation: 117

Invalid utf8 character string when importing csv file into MySQL database

I use the following command to import data from a .csv file into a MySQL database table like so:

     String loadQuery = "LOAD DATA LOCAL INFILE '" + file + "' INTO TABLE source_data_android_cell  FIELDS TERMINATED BY ','" + "ENCLOSED BY '\"'"
 + " LINES TERMINATED BY '\n'  " + "IGNORE 1 LINES(.....)" +"SET test_date = STR_TO_DATE(@var1, '%d/%m/%Y %k:%i')";

However, as one of the columns in the sourcefile contains a really screwy data which is: viva Y31L.RastaMod䋢_Version the program refuses to import the data into MySQL and keeps throwing this error:

java.sql.SQLException: Invalid utf8 character string: 'viva Y31L.RastaMod'

I searched up on this but cant really understand what exactly the error was, other than that the INPUT format of this string "viva Y31L.RastaMod䋢_Version" was wrong and didn't fit the utf8 format used in the MySQL database?

However, I already did the following which is SET NAMES UTF8MB4 in my MySQL db, since it was suggested in other questions that UTF8MB4 was more flexible in accepting weird characters.

I explored this further by manually inserting that weird data into MySQL database table in the Command Prompt, which worked fine. In fact, the table displayed almost the full entry: viva Y31L.RastaMod?ã¢_Version. But if I ran my program from the IDE the file gets rejected.

Would appreciate any explanations.

Second minor question related to the import process of csv file into mySQL:

I noticed that I couldn't import a copy of the same file into the MySQL database. Errors thrown included that the data was a duplicate. Is that because MySQL rejects duplicate column data? But when I changed all the data of one column leaving the rest the same in that copied file, it gets imported correctly. Why is that so?

Upvotes: 0

Views: 3457

Answers (1)

Tim Biegeleisen
Tim Biegeleisen

Reputation: 520978

I don't think this immediate error has to do with the destination of the data not being able to cope with UTF-8 characters, but rather the way you are using LOAD DATA. You can try specifying the character set which should be used when loading the data. Consider the following LOAD DATA command, which is what you had originally but slightly modified:

LOAD DATA LOCAL INFILE path/to/file INTO TABLE source_data_android_cell
CHARACTER SET utf8
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 LINES(.....)
SET test_date = STR_TO_DATE(@var1, '%d/%m/%Y %k:%i')

This being said, you should also make sure that the target table uses a character set which supports the data you are trying to load into it.

Upvotes: 1

Related Questions