Reputation: 55
I am generating a zip file containing csv using ZipOutputStream
. I have passed the encoding UTF-8 but the problem is that German umlauts are not compressed properly. When uncompressed, they do not appear properly in the file.
I am not sure if the problem is with compression itself or the decompression.
All the topics related to this issue are mainly about special characters in the filename, but for me the problem is appearing in the data.
val zos = ZipOutputStream (outputStream, StandardCharsets.UTF_8)
val entry = ZipEntry("file1.csv")
zos.putNextEntry(entry)
val writer = CsvWriter(zos)
for (entr in data)
writer.appendRow {entr.forEach { write(it) }}
zos.closeEntry()
zos.close()
Upvotes: 1
Views: 1621
Reputation: 8204
I don't think that your example is correct, because you're passing a ZipOutputStream
directly to CsvWriter
. Assuming you are using OpenCSV, the CsvWriter
constructor needs a Writer
, not an OutputStream
.
In Java, I/O streams are either byte streams, which are raw data; or character streams, which consist of Unicode characters. In order to convert from one to the other, you must supply a character encoding, which tells it how to convert characters to/from bytes. (If you don't provide one, then Java will use the default character encoding — which depends on the platform but is commonly UTF-8.) InputStream
and OutputStream
are byte streams, while the corresponding character streams are called Reader
and Writer
.
You have a ZipOutputStream
, which is a byte stream. The OpenCSV CsvWriter
constructor requires a Writer
, a character stream, which makes sense because CSV is a text format. (I imagine this would be true of other CsvWriter
implementations as well.) You should wrap your ZipOutputStream
in an instance of OutputStreamWriter
, which will convert the CSV characters into bytes. You can specify the character encoding in the OutputStreamWriter
constructor.
Upvotes: 2
Reputation: 43738
From the docu:
charset - the charset to be used to encode the entry names and comment
So setting UTF-8 does not have any effect on the content which already has to be a stream of bytes.
The problem must occur in CsvWriter
.
Upvotes: 2