Reputation: 1049
I have a csv file containing international text as shown below:
+0000000000000010003.,+0000000000000000103.,+0526640777496331405.,+0000000000000000019.,"¿¿¿¿¿¿"
+0000000000000010020.,+0000000000000000120.,+0526640777496331405.,+0000000000000000019.,"¿¿¿¿¿¿¿¿"
After I upload the file to the database server via FTP, I'm seeing some junk characters:
ÿÅ+0000000000000010003.,+0000000000000000103.,+0526640777496331405.,+0000000000000000019.,"³0¢0°0ë0ü0Ã0"
+0000000000000010020.,+0000000000000000120.,+0526640777496331405.,+0000000000000000019.,"Ã0ë0·0ü0¦0§0¤0Ã0"
I then tried using the iconv
command to fix the contents of the file:
iconv -f ISO8859-9 -t UTF-8 test/sample_cat_master.csv > test/sample_cat_master_test.csv
It didn't work, and I still see the junk characters.
Importing that file into Db2 produces the following message:
SQL3110N The utility has completed processing. "0" rows were read
from the input file.
SQL3221W ...Begin COMMIT WORK. Input Record Count = "0".
SQL3222W ...COMMIT of any database changes was successful.
SQL3149N "0" rows were processed from the input file. "0" rows were
successfully inserted into the table. "0" rows were rejected.
Upvotes: 0
Views: 1138
Reputation: 5332
The file is being corrupted due to improper code page translation, so you'll need to determine where and how it occurs in order to prevent it. Your attempts to view and/or edit the file with Linux/UNIX utilities may also be translating the file's UTF-8 characters, since most distros rarely default to a UTF-8 code page.
Before getting the database involved, try FTPing the file in binary mode in hopes of preserving the UTF-8 encoding and avoiding an unwanted code page conversion. The od utility is particularly useful for examining the contents of a binary file or a text file that uses a different code page. If od is not showing valid multi-byte sequences for the UTF-8 characters, then there's no chance that the database is going to treat the UTF-8 data properly, either.
Which code page was your DB2 database built to use? If not 1208 (UTF-8), you'll probably encounter additional translation issues when using the IMPORT utility. You may also need to set DB2CODEPAGE to 1208 in your client environment and DB2 registry, as well as setting codepage=1208 in the MODIFIED BY section of your IMPORT statement.
Upvotes: 1