membersound
membersound

Reputation: 86925

Read a zip file line by line, or decompress first?

I have a zipped csv file that is going to be processed on a regular basis 3-4x a day. Size may be from 500-1000mb.

I want to read the contained csv lines line by line. Is it therefore better to first unzip the file and read it then, or could I as well use the java Zip stream to read the file?

Is there any advantage (performance?) of one approach over the other?

Upvotes: 0

Views: 474

Answers (3)

Nicholas Miller
Nicholas Miller

Reputation: 4410

Zipping and Unzipping will both be time expensive.

If can access your .csv file without unzipping (I don't know if your .csv file is difficult to read in its compressed state), then you can open the .CSV file as a RandomAccessFile to only work with particular lines instead of the entire file.

This may or may not be applicable, but at the very least, it would improve performance greatly since you would only read/write data from where you need to.

Upvotes: 0

VGR
VGR

Reputation: 44413

One of the slowest activities for a computer is hard drive access (at least until SSDs are more common). So unzipping it and then reading the unzipped file will be significantly slower.

You will get much better performance reading lines directly from a ZipInputStream.

Upvotes: 1

Daniel Nuriyev
Daniel Nuriyev

Reputation: 643

In my opinion unzipping is faster and possibly simpler. If performance is important, test both methods. If the disk space is limited which is not the case nowadays, then you have no choice but to read within the zip.

Upvotes: 0

Related Questions