Mad Scientist
Mad Scientist

Reputation: 917

Java 7zip compression is too big

I have a Java program which searches for a folder with the date of yesterday and compresses it to a 7zip file and deletes it at the end. Now I have noticed that the generated 7zip archive files by my program are way too big. When I use a program like 7-Zip File Manager to compress my files it generates an archive which is 5 kb big while my program generates an archive which is 737 kb big for the same files (which have a 873 kb size). Now I am afraid that my program does not compress it to a 7zip file but do a usual zip file. Is there a way to change something in my code so that it generates a smaller 7zip file like 7-Zip File Manager would do it?

package SevenZip;

import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.text.SimpleDateFormat;
import java.util.Calendar;
import java.util.concurrent.TimeUnit;

import org.apache.commons.compress.archivers.sevenz.SevenZArchiveEntry;
import org.apache.commons.compress.archivers.sevenz.SevenZOutputFile;

public class SevenZipUtils {

    public static void main(String[] args) throws InterruptedException, IOException {

        String sourceFolder = "C:/Users/Ferid/Documents/Dates/";
        String outputZipFile = "/Users/Ferid/Documents/Dates";
        int sleepTime = 0;
        compress(sleepTime, outputZipFile, sourceFolder);
    }

    public static boolean deleteDirectory(File directory, int sleepTime) throws InterruptedException {
        if (directory.exists()) {
            File[] files = directory.listFiles();
            if (null != files) {
                for (int i = 0; i < files.length; i++) {
                    if (files[i].isDirectory()) {
                        deleteDirectory(files[i], sleepTime);
                        System.out.println("Folder deleted: " + files[i]);
                    } else {
                        files[i].delete();
                        System.out.println("File deleted: " + files[i]);
                    }
                }
            }
        }
        TimeUnit.SECONDS.sleep(sleepTime);
        return (directory.delete());
    }

    public static void compress(int sleepTime, String outputZipFile, String sourceFolder)
            throws IOException, InterruptedException {

        // finds folder of yesterdays date
        final Calendar cal = Calendar.getInstance();
        cal.add(Calendar.DATE, -1); // date of yesterday
        String timeStamp = new SimpleDateFormat("yyyyMMdd").format(cal.getTime()); // format the date
        System.out.println("Yesterday was " + timeStamp);

        if (sourceFolder.endsWith("/")) { // add yesterday folder to sourcefolder path
            sourceFolder = sourceFolder + timeStamp;
        } else {
            sourceFolder = sourceFolder + "/" + timeStamp;
        }

        if (outputZipFile.endsWith("/")) { // add yesterday folder name to outputZipFile path
            outputZipFile = outputZipFile + " " + timeStamp + ".7z";
        } else {
            outputZipFile = outputZipFile + "/" + timeStamp + ".7z";
        }

        File file = new File(sourceFolder);

        if (file.exists()) {
            try (SevenZOutputFile out = new SevenZOutputFile(new File(outputZipFile))) {
                addToArchiveCompression(out, file, ".");
                System.out.println("Files sucessfully compressed");

                deleteDirectory(new File(sourceFolder), sleepTime);
            }
        } else {
            System.out.println("Folder does not exist");
        }
    }

    private static void addToArchiveCompression(SevenZOutputFile out, File file, String dir) throws IOException {
        String name = dir + File.separator + file.getName();
        if (file.isFile()) {
            SevenZArchiveEntry entry = out.createArchiveEntry(file, name);
            out.putArchiveEntry(entry);

            FileInputStream in = new FileInputStream(file);
            byte[] b = new byte[1024];
            int count = 0;
            while ((count = in.read(b)) > 0) {
                out.write(b, 0, count);
            }
            out.closeArchiveEntry();
            in.close();
            System.out.println("File added: " + file.getName());
        } else if (file.isDirectory()) {
            File[] children = file.listFiles();
            if (children != null) {
                for (File child : children) {
                    addToArchiveCompression(out, child, name);
                }
            }
            System.out.println("Directory added: " + file.getName());
        } else {
            System.out.println(file.getName() + " is not supported");
        }
    }
}

I am using the Apache Commons Compress library

EDIT: Here is a link where I have some of the Apache Commons Compress code from.

Upvotes: 8

Views: 3596

Answers (3)

Matthieu
Matthieu

Reputation: 3117

I don't have enough rep to comment anymore so here are my thoughts:

  • I don't see where you set the compression ratio so it could be that SevenZOutputFile uses no (or very low) compression. As @CristiFati said, the difference in compression is odd, especially for text files
  • As noted by @df778899, there is no support for solid compression, which is how the best compression ratio is achieved, so you won't be able to do as well as the 7z command line

That said, if zip really isn't an option, your last resort could be to call the proper command line directly within your program.

If pure 7z is not mandatory, another option would be to use a "tgz"-like format to emulate solid compression: first compress all files to a non-compressed file (e.g. tar format, or zip file with no compression), then compress that single file in zip mode with standard Java Deflate algorithm. Of course that will be viable only if that format is recognized by further processes using it.

Upvotes: 5

user3408531
user3408531

Reputation:

Use 7-Zip file archiver instead, it compresses 832 KB file to 26.0 KB easily:

  1. Get its Jar and SDK.
  2. Choose LZMA Compression .java related files.
  3. Add Run arguments to project properties: e "D:\\2017ASP.pdf" "D:\\2017ASP.7z", e stands for encode, "input path" "output path".
  4. Run the project [LzmaAlone.java].

Results

Case1 (.pdf file ): From 33,969 KB to 24,645 KB.

Case2 (.docx file ): From 832 KB to 26.0 KB.

Upvotes: 5

df778899
df778899

Reputation: 10931

Commons Compress is starting a new block in the container file for each archive entry. Note the block counter here:

block-per-file

Not quite the answer you were hoping for, but the docs say it doesn't support "solid compression" - writing several files to a single block. See paragraph 5 in the docs here.

A quick look around found a few other Java libraries that support LZMA compression, but I couldn't spot one that could do so within the parent container file format for 7-Zip. Perhaps someone else knows of an alternative...

It sounds like a normal zip file format (e.g. via ZipOutputStream) is not an option?

Upvotes: 8

Related Questions