user1766873
user1766873

Reputation:

Random-access Zip file without writing it to disk

I have a 1-2GB zip file with 500-1000k entries. I need to get files by name in fraction of second, without full unpacking. If file is stored on HDD, this works fine:

public class ZipMapper {
    private HashMap<String,ZipEntry> map;
    private ZipFile zf;

    public ZipMapper(File file) throws IOException {
        map = new HashMap<>();
        zf = new ZipFile(file);

        Enumeration<? extends ZipEntry> en = zf.entries();
        while(en.hasMoreElements()) {
            ZipEntry ze = en.nextElement();
            map.put(ze.getName(), ze);
        }
    }

    public Node getNode(String key) throws IOException {
        return Node.loadFromStream(zf.getInputStream(map.get(key)));
    }
}

But what can I do if program downloaded the zip file from Amazon S3 and has its InputStream (or byte array)? While downloading 1GB takes ~1 second, writing it to HDD may take some time, and it is slightly harder to handle multiple files since we don't have HDD garbage collector.

ZipInputStream does not allow to random access to entries.

It would be nice to create a virtual File in memory by byte array, but I couldn't find a way to.

Upvotes: 7

Views: 2565

Answers (5)

user1766873
user1766873

Reputation:

Blackbox library only has Extract(String name, String outputPath) method. Seems that it can randomly access any file in seekable zip-stream indeed, but it can't write result to byte array or return stream.

I couldn't find and documentation for ShrinkWrap. I couldn't find any suitable implementations of FileSystem/FileSystemProvider etc.

However, it turned out that Amazon EC2 instance I'm running (Large) somehow writes 1gb file to disk in ~1 second. So I just write file to the disk and use ZipFile.

If HDD would be slow, I think RAM disk would be the easiest solution.

Upvotes: 0

C-Otto
C-Otto

Reputation: 5843

A completely different approach: If the server has the file on disk (and possibly cached in RAM already): make it give you the file(s) directly. In other words, submit which files you need and then take care to extract and deliver these on the server.

Upvotes: 0

Nickolay Olshevsky
Nickolay Olshevsky

Reputation: 14160

You can use SecureBlackbox library, it allows ZIP operations on any seekable streams.

Upvotes: 1

Yair Zaslavsky
Yair Zaslavsky

Reputation: 4137


I think you should consider using your OS in order to create "in memory" file system (i.e - RAM drive).
In addition, take a look at the FileSystems API.

Upvotes: 0

Puce
Puce

Reputation: 38142

You could mark the file to be deleted on exit.

If you want to go for an in-memory approach: Have a look at the new NIO.2 File API. Oracle provides a filesystem provider for zip/ jar and AFAIK ShrinkWrap provides an in-memory filesystem. You could try a combination of the two.

I've written some utility methods to copy directories and files to/from a Zip file using the NIO.2 File API (the library is Open Source):

Maven:

<dependency>  
    <groupId>org.softsmithy.lib</groupId>  
    <artifactId>softsmithy-lib-core</artifactId>  
    <version>0.3</version>  
</dependency>  

Tutorial:

http://softsmithy.sourceforge.net/lib/current/docs/tutorial/nio-file/index.html

API: CopyFileVisitor.copy

Especially PathUtils.resolve helps with resolving paths across filesystems.

Upvotes: 2

Related Questions