Hitatichi
Hitatichi

Reputation: 73

Java localized filenames

How can i set localized filenames in java.Currently everytime i click on a localized file having a non-ascii filename in my application, the windows save dialog box pops out, but it isnt displaying the filename properly if the charset is anything above ISO-88859-1.

This is my code which is saving the file.

            InputStream inputStream = null;
 try {
  response.resetBuffer();
  response.setContentType(fileStream.getContentType());
  response.setContentLength((int) fileStream.getContentLength());
  response.addHeader("Content-Disposition",
    "attachment;filename=\"" + fileName + "\"");
  ServletOutputStream stream = response.getOutputStream();
  byte[] buffer = new byte[1024];
  int read = 0;
  int total = 0;
  inputStream = fileStream.getInputStream();
  while ((read = inputStream.read(buffer)) > 0) {
   stream.write(buffer, 0, read);
   total += read;
  }
  response.flushBuffer();
 } finally {
  if (inputStream != null) {
   inputStream.close();
  }
 }

I would be very helpful if someone could share their ideas on how to resolve this issue. Thanks in advance.

Upvotes: 6

Views: 7545

Answers (4)

sporak
sporak

Reputation: 516

Although it's old question, it's still actual. I found solution that works for all my browsers.

See my post in other thread:
Java servlet download filename special characters

In short, browsers expect that value in filename parameter is encoded in browsers native encoding (if no different charset is specified for filename parameter). Browser's native encoding is usually utf-8 (FF, Opera, Chrome), but for IE it is win-1250. Hence, if we put value into filename parametr, that is encoded by utf-8/win-1250 according to user's browser, it should work.

For example, if we have file named omáčka.xml,
for FireFox, Opera and Chrome I response this header (encoded in utf-8):

Content-Disposition: attachment; filename="omáčka.xml"

and for IE I response this header (encoded in win-1250):

Content-Disposition: attachment; filename="omáèka.jpg"

Java example is in my post that is mentioned above.

Note #1 (@dkarp):

Be aware of using URLEncoder.encode() since this method doesn't encode input string into url-encoding. This method encodes input strings into form-encoding which is very similiar but differs in some cases - for example space character ' ' is encoded as '+' instead of '%20'.

To perform correct url encoding you should rather use URI class:

URI uri = new URI(null, null, "foo-ä-€.html", null);
System.out.println(uri.toASCIIString());

Upvotes: 1

Nick
Nick

Reputation: 41

I faced similar problems with filenames containing greek characters.I used the code provided in the answer above (my thanks to dkarp) combined with detecting which browser is used. this is the result:

String user_agent = request.getHeader("user-agent");
boolean isInternetExplorer = (user_agent.indexOf("MSIE") > -1);
if (isInternetExplorer) {
    response.setHeader("Content-disposition", "attachment; filename=\"" + URLEncoder.encode(filename, "utf-8") + "\"");
} else {
    response.setHeader("Content-disposition", "attachment; filename=\"" + MimeUtility.encodeWord(filename) + "\"");
}

I tested it with firefox 3.6 , chrome 10.0 and Internet Explorer 8 and it seems to work fine.

Upvotes: 4

dkarp
dkarp

Reputation: 14763

What gustafc says is correct, but it doesn't get you where you want to be. RFC 2231 allows you to use an alternative format for non-ASCII Content-Type and Content-Disposition parameters, but not all browsers support it. The way that's most likely to work, unfortunately, is to ignore what RFC 2183 says and use RFC 2047 encoded-words in the response:

response.addHeader("Content-Disposition", "attachment; " +
    "filename=\"" + MimeUtility.encodeWord(fileName, "utf-8", "Q") + "\"");

Note that this may not work for all browsers. Some variants of IE require that you URL-encode the value instead:

response.addHeader("Content-Disposition",
    "attachment; filename=" + URLEncoder.encode(filename, "utf-8"));

Upvotes: 13

gustafc
gustafc

Reputation: 28865

From section 2.3 in the spec, it seems you can't use non-US-ASCII characters:

Current [RFC 2045] grammar restricts parameter values (and hence Content-Disposition filenames) to US-ASCII. We recognize the great desirability of allowing arbitrary character sets in filenames, but it is beyond the scope of this document to define the necessary mechanisms. We expect that the basic [RFC 1521] `value' specification will someday be amended to allow use of non-US-ASCII characters, at which time the same mechanism should be used in the Content-Disposition filename parameter.

Upvotes: 2

Related Questions