AdHominem
AdHominem

Reputation: 1232

How do I recursively unzip nested ZIP files?

Given there is a secret file deep inside a nested ZIP file, i.e. a zip file inside a zip file inside a zip file, etc...

The zip files are named 1.zip, 2.zip, 3.zip, etc...

We don't know how deep the zip files are nested, but it may be thousands.

What would be the easiest way to loop through all of them up until the last one to read the secret file?

My initial approach would have been to call unzip recursively, but my Bash skills are limited. What are your ideas to solve this?

Upvotes: 9

Views: 29134

Answers (6)

pangratt12345
pangratt12345

Reputation: 95

Here is a bit different approach. This script unzips recursively and retains the original folder hierarchy structure inside zip file instead of unzipping everything into the current directory.
This script also handles a bit pathological cases in which there are many zip files within zips or folders alternately in one zip file. This script checks for special cases like if provided zip file exists and have .zip extension.
You can call this script with provided zip file name with its full path this way
path_to_script/zip_file_extractor.sh path_to_zip_file/zip_file.zip

Iterating over files and folders based on this tutorial to avoid problems with white spaces, NULs, newline delimiters, etc.
http://mywiki.wooledge.org/BashFAQ/001

#!/bin/bash

function unzip_file() {
  zipfile=$1; # zip file passed as first parameter to this function
  # unzip zip file to a directory with removed extension (.zip)
  unzip "${zipfile}" -d "${zipfile%.*}"; # %.* removes file extension
  rm "${zipfile}"; # remove redundant zip file
}

function extract_zips_in_current_dir_level() {
  find . -mindepth 1 -maxdepth 1 -type f -iname '*.zip' -print0 | 
  while IFS= read -r -d '' zipfile; do 
    unzip_file $zipfile;
  done
}

function extract_zips_recursively() {
  folder=$1; # folder passed as first parameter to this function
  pushd "${folder}" > /dev/null;
  extract_zips_in_current_dir_level; # this can generate new folders after unzipping
  find . -mindepth 1 -maxdepth 1 -type d -print0 | 
  while IFS= read -r -d '' directory; do
    extract_zips_recursively $directory; # call this function recursively for every subdirectory
  done
  popd > /dev/null;
}

if [[ $# -lt 1 ]]; then # if number of input parameters for this script is lesser than 1
  echo "This script needs zip file with its full path provided as a parameter";
  exit;
else
  input_zip_file=$1; # zip file with full path to be extracted into the same path
  if [[ ${input_zip_file##*.} != "zip" ]]; then # if input file doesn't have .zip extension
    echo "Provided file as first parameter for this script should have .zip extension";
    exit;
  else
    if [ ! -f $input_zip_file ]; then # if provided zip file doesn't exist in filesystem
      echo "Provided zip file as first parameter for this script doesn't exist in filesystem";
      exit;
    fi
  fi
fi

unzip_file $input_zip_file;
unzipped_folder="${input_zip_file%.*}"; # %.* removes file extension
if [ -d "$unzipped_folder" ]; then # check if file got unzipped correctly
  extract_zips_recursively $unzipped_folder;
fi

Upvotes: 2

Mavin
Mavin

Reputation: 59

Here is a solution for windows assuming 7zip is installed in the default location.

@echo off
Setlocal EnableDelayedExpansion
Set source=%1
Set SELF=%~dpnx0
For %%Z in (!source!) do (
    set FILENAME=%%~nxZ
)
set FILENAME=%FILENAME:"=%

"%PROGRAMFILES%\7-zip\7z.exe" x -o* -y "%FILENAME%"

REM DEL "%FILENAME%"
rem " This is just to satisfy stackoverflow code formatting!


For %%Z in (!source!) do (
    set FILENAME=%%~nZ
)
for %%a in (zip rar jar z bz2 gz gzip tgz tar lha iso wim cab rpm deb) do (
    
    forfiles /P ^"%FILENAME%^" /S /M *.%%a /C "cmd /c if @isdir==FALSE \"%SELF%\" @path"
)

This has been adapted from here https://social.technet.microsoft.com/Forums/ie/en-US/ccd7172b-85e3-4b4a-ad93-5902e0abd903/batch-file-extracting-all-files-from-nested-archives?forum=ITCG

Notes:

  1. The only way to do variable modification using the ~ modifiers is to use a dummy for..in loop. If there is a better way please edit.
  2. ~nx modifies the variable to make it a full path+file name.
  3. ~dpnx also does the same thing to %0 i.e. gets the full path and filename of the script.
  4. -o* in the 7zip command line allows 7zip to create folder names without the .zip extension like it does when extracting with a right click in the gui.
  5. ~n modifies the variable to make it a filename without an extension. i.e. drops the .zip
  6. Note that the escape character (for quotes) in FORFILES /P is ^ (caret) while for the CMD /C it is \. This ensures that it handles path and filenames with spaces also recursively without any problem.
  7. You can remove the REM from the DEL statement if you want the zip file to be deleted after unzipping.

Upvotes: 0

user930412
user930412

Reputation: 11

Checkout this java based utility nzip for nested zips.

Extracting and compressing nested zips can be done easily using following commands:

java -jar nzip.jar -c list -s readme.zip 

java -jar nzip.jar -c extract -s "C:\project\readme.zip" -t readme 

java -jar nzip.jar -c compress -s readme -t "C:\project\readme.zip" 

PS. I am the author and will be happy to fix any bugs quickly.

Upvotes: 0

Ronan Boiteau
Ronan Boiteau

Reputation: 10138

Probably not the cleanest way, but that should do the trick:

#!/bin/sh
IDX=1 # ID of your first zip file
while [ 42 ]
do
    unzip $IDX.zip # Extract
    if [[ $? != 0 ]]
    then
        break # Quit if unzip failed (no more files)
    fi
    if [ $IDX -ne 1 ]
    then
        rm $IDX.zip # Remove zip to leave your directory clean
    fi
    (( IDX ++ )) # Next file
done

Upvotes: 0

AdHominem
AdHominem

Reputation: 1232

Thanks Cyrus! The master wizard Shawn J. Goff had the perfect script for this:

while [ "`find . -type f -name '*.zip' | wc -l`" -gt 0 ]; do find -type f -name "*.zip" -exec unzip -- '{}' \; -exec rm -- '{}' \;; done

Upvotes: 10

silverdrop
silverdrop

Reputation: 71

Here's my 2 cents.

#!/bin/bash

function extract(){
  unzip $1 -d ${1/.zip/} && eval $2 && cd ${1/.zip/}
  for zip in `find . -maxdepth 1 -iname *.zip`; do
    extract $zip 'rm $1'
  done
}

extract '1.zip'

Upvotes: 5

Related Questions