Reputation: 2850
I'd like to take my input stream and upload gzipped parts to s3 in a similar fashion to the multipart uploader. However, I want to store the individual file parts in S3 and not turn the parts into a single file.
To do so, I have created the following methods.
But, when I try to gzip decompress each part gzip throws an error and says: gzip: file_part_2.log.gz: not in gzip format
.
I'm not sure if I am compressing each part correctly?
If I re-initialise the gzipoutputstream: gzip = new GZIPOutputStream(baos);
and set gzip.finish()
after reseting the byte array output stream baos.reset();
then I am able to decompress each part. Not sure why I need todo this, is there a similar reset
for the gzipoutputstream?
public void upload(String bucket, String key, InputStream is, int partSize) throws Exception
{
String row;
BufferedReader br = new BufferedReader(new InputStreamReader(is, ENCODING));
ByteArrayOutputStream baos = new ByteArrayOutputStream();
GZIPOutputStream gzip = new GZIPOutputStream(baos);
int partCounter = 0;
int lineCounter = 0;
while ((row = br.readLine()) != null) {
if (baos.size() >= partSize) {
partCounter = this.uploadChunk(bucket, key, baos, partCounter);
baos.reset();
}else if(!row.equals("")){
row += '\n';
gzip.write(row.getBytes(ENCODING));
lineCounter++;
}
}
gzip.finish();
br.close();
baos.close();
if(lineCounter == 0){
throw new Exception("Aborting upload, file contents is empty!");
}
//Final chunk
if (baos.size() > 0) {
this.uploadChunk(bucket, key, baos, partCounter);
}
}
private int uploadChunk(String bucket, String key, ByteArrayOutputStream baos, int partCounter)
{
ObjectMetadata metaData = new ObjectMetadata();
metaData.setContentLength(baos.size());
String[] path = key.split("/");
String[] filename = path[path.length-1].split("\\.");
filename[0] = filename[0]+"_part_"+partCounter;
path[path.length-1] = String.join(".", filename);
amazonS3.putObject(
bucket,
String.join("/", path),
new ByteArrayInputStream(baos.toByteArray()),
metaData
);
log.info("Upload chunk {}, size: {}", partCounter, baos.size());
return partCounter+1;
}
Upvotes: 0
Views: 739
Reputation: 26
The problem is that you're using a single GZipOutputStream
for all chunks. So you're actually writing pieces of a GZipped file, which would have to be recombined to be useful.
Making the minimal change to your existing code:
if (baos.size() >= partSize) {
gzip.close();
partCounter = this.uploadChunk(bucket, key, baos, partCounter);
baos = baos = new ByteArrayOutputStream();
gzip = new GZIPOutputStream(baos);
}
You need to do the same at the end of the loop. Also, you shouldn't throw an exception if the line counter is 0: it's entirely possible that the file is exactly divisible into a set number of chunks.
To improve the code, I would wrap the GZIPOutputStream
in an OutputStreamWriter
and a BufferedWriter
, so that you don't need to do the string-bytes conversion explicitly.
And lastly, don't use ByteArrayOutputStream.reset()
. It doesn't save you anything over just creating a new stream, and opens the door for errors if you ever forget to reset.
Upvotes: 1