Read and parsing SHiftJIS file in a UTF8 JVM

Question

I have a Japanese client that provide a data feed file in SHift-JIS encoding (with both Kana and Kanji Japanese characters).

I have to upload the data in that Shift-JIS Japanese feed file, into my web application JVM, with startup option as UTF-8 encoding. (-Dfile.encoding=UTF-8)

The application parses and identifies the various data fields in feed file by character length.

For example, FirstName [Length=30 Characters][Starting Position=11][Ending Position=40].

The application parses UTF8 feed files successfully, (which have only English chars) without any issues.

However, when trying to upload the Shift-JIS Japanese feed file, the fields are not identified correctly.

If I change the web application JVM startup option to Shift-JIS (-Dfile.encoding=SJIS), then the Japanese Shift-JIS feed file is parsed successfully.

The problem is that changing the JVM encoding in the live environment is not possible.

I assume it's the multi-byte representation difference between UTF-8 and Shift-JIS that is causing the web application to fail parsing the Japanese Shift-JIS feed file in UTF8 JVM.

Is there anyway I can convert the characters in Japanese feed file in SHift-JIS encoding, to their equivalent UTF8 encoding? Basically, Japanese characters in ShiftJIS must be converted to the same Japanese characters in UTF8.

Web application back-end is a PostgreSQL DB, encoding UTF8.

Read and parsing SHiftJIS file in a UTF8 JVM

Answers (1)

Related Questions