Sakey
Sakey

Reputation: 1

convert from japanese string to unicode java

I have problem when uploading files with unicode name. Example when I upload a file name "あ.pdf" then I got a file name " ̄チツ.pdf" on my server.

Here is my upload code:

    for (Part part : request.getParts()) {
        String fileName = extractFileName(part);
        if(fileName.trim() !=""){
            part.write(savePath + File.separator + fileName);
        }
    }

I appreciate for any answer.

Thank you,

Upvotes: 0

Views: 2752

Answers (1)

OO7
OO7

Reputation: 2807

There is a lot of misinformation floating around about the support of Chinese, Japanese and Korean (CJK) characters. The Unicode Standard supports all of the CJK characters from JIS X 0208, JIS X 0212, JIS X 0221, or JIS X 0213, for example, and many more. This is true no matter which encoding form of Unicode is used: UTF-8, UTF-16, or UTF-32.

UTF-8 is suitable for texts that are mostly Latin alphabet letters. For example, English, Spanish, French, and most web technology such as HTML, CSS, JavaScript. Most Linux's files are in UTF-8 by default. UTF-8 encoding system is backwards compatible with ASCII. (meaning: If a file only contain characters in ASCII, then encoding the file using UTF-8 results the same byte sequence as using ASCII as encoding scheme.)

UTF-16 is another coding system from Unicode. With UTF-16, every char is encoded into least 2 bytes, and commonly used characters in Unicode are exactly 2 bytes. For Asian languages containing lots of Chinese characters, such as ChineseJapanese, UTF-16 creates smaller file size.

There's also UTF-32, which always uses 4 bytes per character. It creates larger file size, but is simpler to parse. Currently, UTF-32 is not being used much.

Most popular encoding systems today :-

1. ASCII. For English. Most widely used before year 2000.
2. UTF-8 of Unicode (used in Linux by default, and much of the Internet)
3. UTF-16 of Unicode (used by Microsoft Windows and Mac OS X's file systems, Java programing language, …)
4. GB 18030 (Used in China, contains all Unicode chars).
5. EUC (Extended Unix Code). Used in Japan.
6. IEC 8859 series (used for most European langs)

Also have a look at these :- Uploaded filename encoding issue for Japanese and Chinese & Java servlet download filename special characters. Hope this will solve your problem.

Upvotes: 1

Related Questions