Reputation: 1000
I have a 2 GB archive (prefer .zip or .rar) file in parts (let's assume 100 parts x 20MB), and I am trying to find a way to unpack it properly. I started with a .zip archive; I had files like test.zip, test.z01, test.z02...test.99, etc. When I merge them in Python like this:
for zipName in zips:
with open(os.path.join(path_to_zip_file, "test.zip"), "ab") as f:
with open(os.path.join(path_to_zip_file, zipName), "rb") as z:
f.write(z.read())
and then, after merge, unpack it like thod"
with zipfile.ZipFile(os.path.join(path_to_zip_file, "test.zip"), "r") as zipObj:
zipObj.extractall(path_to_zip_file)
I get errors, likr
test.zip file isn't zip file.
So then I tried with a .rar archive. I tried to unpack just the first file to see if my code would intelligently look for and pick up the remaining archive fragments, but it did not. So again I merged the .rar files (just like in the .zip case), and then tried to unpack it by using patoolib
:
patoolib.extract_archive("test.rar", outdir="path here")
When I do that, I get errors like:
patoolib.util.PatoolError: could not find an executable program to extract format rar; candidates are (rar,unrar,7z)
After some work I figured out that these merged files are corrupted (I copied it and try to unpack normally on windows using WinRAR, and encountered some problems). So I tried other ways to merge for example using cat
cat test.part.* >test.rar
, but those don't help.
How can I merge and then unpack these archive files properly in Python?
Upvotes: 0
Views: 3621
Reputation: 1940
.zip
to .zip.001
and .z01
to zip.002
and so on.001
( 7z x test.zip.001
)import subprocess
cmd = ['7z', 'x', 'test.zip.001']
sp = subprocess.Popen(cmd, stderr=subprocess.STDOUT, stdout=subprocess.PIPE)
cat test.zip* > test.zip
should also work, but not always imho. Tried it for single file and works, but failed with subfolders. Maintaining the right order is mandatory.
Testing:
7z -v1m a test.zip 12MFile
cat test.zip* > test.zip
7z t test.zip
>> Everything is Ok
Can't check with "official" WinRAR (does this even still exist?!) nor WinZIP Files.
If you want to stay in python this works too (again for my 7z testfiles..):
import shutil
import glob
with open('output_file.zip','wb') as wfd:
for f in glob.glob('test.zip.*'): # Search for all files matching searchstring
with open(f,'rb') as fd:
shutil.copyfileobj(fd, wfd) # Concatinate
pyunpack
(python frontend) with patool
(python backend) and installed unrar
or p7zip-rar
(7z with the unfree rar-stuff) for linux or 7z
in windows can handle zip and rar (and many more) in python7z x -t
flag for explicitly set it as split archive (if file is not named 001 maybe helps). Give as e.g. 7z x -trar.split
or 7z x -tzip.split
or something.Upvotes: 4