Reputation: 129
I receive a data file in ETL from the client and we load the data into Mysql database using Load Data file functionality and use CHARACTER SET as utf8.
LOAD DATA LOCAL INFILE '${filePath}'
INTO TABLE test_staging
CHARACTER SET 'utf8'
FIELDS TERMINATED BY '|'
LINES TERMINATED BY '\n'
(${testcolumns}) SET
first_name = @first_name;
1|"test"|"name"|2
2|"asdf"|asdf&test|2
3|fun|value|2
When I load the above data into the database and it is inserting directly as strings instead of converting to html characters
id first_name last_name
1 "test" "name"
2 "asdf" asdf&test
3 fun value
I tried changing the CHARACTER SET value from utf8 to latin1 but the result is same.
I also tried replacing the special characters while loading the data into database but the issue is, I receive all types of html characters data in the file. I cannot keep on adding the replace function for all of them.
LOAD DATA LOCAL INFILE '${filePath}'
INTO TABLE test_staging
CHARACTER SET 'utf8'
FIELDS TERMINATED BY '|'
LINES TERMINATED BY '\n'
(${testcolumns}) SET
first_name = REPLACE(REPLACE(REPLACE(first_name,''','\''),'"','"'),'&','&');
Is there any character set which converts the html data and loads correctly?
id first_name last_name
1 "test" "name"
2 "asdf" asdf&test
3 fun value
Any help is appreciated... Thanks
Upvotes: 1
Views: 257
Reputation: 1367
The problem you are facing is not about character set. It happens because the software that your client use intentionally converts HTML special characters to their codes.
It is probably possible to convert them back using MySQL though I couldn't find a quick solution, but as you are handling this data with ETL the better option seems to be to use the external tool before you insert the data into the database. One of these for example:
etc.
or something else depending on what tools you have available in your system or which ones you can afford to install.
Upvotes: 1