LA.27
LA.27

Reputation: 2238

Read .tar entries in a specific order (C#, SharpLibZip)

Background

In this website I found several examples for reading a .tar achieve using SharpLibZip.

Question

In my case however I'd like to make sure that entries are read in a specific order based on their names. Is there an easy way to do so?

More details

My .tar archive contains monthly data with per-day files (file-01, file-02, ..., file-31). However, the data provider doesn't seem to pay attention while creating the .tar file and the entries seem to arrive in a random order.

Upvotes: 0

Views: 517

Answers (1)

Mark Adler
Mark Adler

Reputation: 112374

You would need to write your own tar decoder. It is up to you to say if you would consider this to be "easy" or not. The tar format is pretty simple.

You would need to first scan through the tar file to find all the headers, saving the file name and the offset and length of the file data for each. Then you could seek back and forth to the offset of any file to read its contents.

This would be much more difficult if the tar file were compressed, e.g. if it were a .tar.gz file, as opposed to a .tar file.

The tar format is documented here.

Update:

In a comment, the OP revealed that it is actually a .tar.bz2 file. As noted, that requires additional work to be able to randomly access entries. In addition to building an index to the tar contents, the entire .bz2 file needs to be read to build an index to the compression entry points, which do not correspond to where files start in the tar archive. Then to access a file you first would go to the closest bzip2 entry point that precedes the start of that file data, and decompress from there until you arrive at and then read out that data.

It would be easier to simply rearchive and recompress the files into the zip format, which is designed to randomly access and extract individual entries.

Upvotes: 1

Related Questions