Ballyhoo
Ballyhoo

Reputation: 1

How can I pre-calculate the end filesize of a batch of compressed files in Bash?

I have about 15GB of assorted files in a directory that need to be compressed into a batch of zipped files, each of which has to be smaller than 500MB. I've got a pretty simple Bash script that gets a maximum number of files per batch and compresses them into Zip files; it works fine, but given how widely the individual files vary in size, I'm ending up with a lot more Zipped files than I'd like.

Is there a relatively efficient Bash script to calculate the number of files to be compressed into each archive, based on a set maximum archive size?

A script that calculates the total size of all files in the "latest batch" and assigns that block of files for compression when a preset total is reached would also work if you've got one, but now I'm curious if there's an optimal solution.

Here's the script I've got, with comments:

#!/bin/bash
 
# Set the directory path to the folder containing the files to be compressed
DIR_PATH="/path"
 
# Set the name prefix of the output archive files
ARCHIVE_PREFIX="archive"
 
# Set the maximum number of files per batch
MAX_FILES=1000
 
# Change directory to the specified path
cd "$DIR_PATH"
 
# Get a list of all files in the directory
files=( * )
 
# Calculate the number of batches of files
num_batches=$(( (${#files[@]} + $MAX_FILES - 1) / $MAX_FILES ))
 
# Loop through each batch of files
for (( i=0; i<$num_batches; i++ )); do
    # Set the start and end indices of the batch
    start=$(( $i * $MAX_FILES ))
    end=$(( ($i + 1) * $MAX_FILES - 1 ))
     
    # Check if the end index exceeds the number of files
    if (( $end >= ${#files[@]} )); then
        end=$(( ${#files[@]} - 1 ))
    fi
     
    # Create a compressed archive file for the batch of files
    archive_name="${ARCHIVE_PREFIX}_${i}.zip"
    tar -cvzf "$archive_name" "${files[@]:$start:$MAX_FILES}"
done

Upvotes: 0

Views: 55

Answers (2)

Armali
Armali

Reputation: 19395

A script that calculates the total size of all files in the "latest batch" and assigns that block of files for compression when a preset total is reached would also work

That's rather easy, but requires lucky guessing of that preset total.

…   # set needed variables as in your script

archive()
{
    # Create a compressed archive file for the batch of files
    archive_name="${ARCHIVE_PREFIX}_$((i++)).zip"
    echo tar -cvzf "$archive_name" "${files[@]:start:end-start}"
    start=$end
}

# Loop through each of files
for file in "${files[@]}"
do  let sum+=`stat -c%s "$file"`        # accumulate the file sizes
    let ++end
    if [ 1000000000 -le $sum ]; then    # compare with preset total
        sum=0
        archive
    fi
done
if ((end-start)); then archive; fi      # final batch

Upvotes: 0

Mark Adler
Mark Adler

Reputation: 112502

You would have to define "relatively efficient", but the only way to know what the compressed size will be is to do the compression. There is no "calculation" that can be done.

Upvotes: 2

Related Questions