Kate
Kate

Reputation: 133

Python: Continuously check size of files being added to list, stop at size, zip list, continue

I am trying to loop through a directory, check the size of each file, and add the files to a list until they reach a certain size (2040 MB). At that point, I want to put the list into a zip archive, and then continue looping through the next set of files in the directory and continue to do the same thing. The other constraint is that files with the same name but different extension need to be added together into the zip, and can't be separated. I hope that makes sense.

The issue I am having is that my code basically ignores the size constraint that I have added, and just zips up all the files in the directory anyway.

I suspect there is some logic issue, but I am failing to see it. Any help would be appreciated. Here is my code:

import os,os.path, zipfile
from time import *

#### Function to create zip file ####
# Add the files from the list to the zip archive
def zipFunction(zipList):

    # Specify zip archive output location and file name
    zipName = "D:\Documents\ziptest1.zip"

    # Create the zip file object
    zipA = zipfile.ZipFile(zipName, "w", allowZip64=True)  

    # Go through the list and add files to the zip archive
    for w in zipList:

        # Create the arcname parameter for the .write method. Otherwise  the zip file
        # mirrors the directory structure within the zip archive (annoying).
        arcname = w[len(root)+1:]

        # Write the files to a zip
        zipA.write(w, arcname, zipfile.ZIP_DEFLATED)

    # Close the zip process
    zipA.close()
    return       
#################################################
#################################################

sTime = clock()

# Set the size counter
totalSize = 0

# Create an empty list for adding files to count MB and make zip file
zipList = []

tifList = []

xmlList = []

# Specify the directory to look at
searchDirectory = "Y:\test"

# Create a counter to check number of files
count = 0

# Set the root, directory, and file name
for root,direc,f in os.walk(searchDirectory):

        #Go through the files in directory
        for name in f:
            # Set the os.path file root and name
            full = os.path.join(root,name)

            # Split the file name from the file extension
            n, ext = os.path.splitext(name)

            # Get size of each file in directory, size is obtained in BYTES
            fileSize = os.path.getsize(full)

            # Add up the total sizes for all the files in the directory
            totalSize += fileSize

            # Convert from bytes to megabytes
                # 1 kilobyte = 1,024 bytes
                # 1 megabyte = 1,048,576 bytes
                # 1 gigabyte = 1,073,741,824 bytes
            megabytes = float(totalSize)/float(1048576)

            if ext == ".tif":  # should be everything that is not equal to XML (could be TIF, PDF, etc.) need to fix this later
                tifList.append(n)#, fileSize/1048576])
                tifSorted = sorted(tifList)
            elif ext == ".xml":
                xmlList.append(n)#, fileSize/1048576])
                xmlSorted = sorted(xmlList)

            if full.endswith(".xml") or full.endswith(".tif"):
                zipList.append(full)

            count +=1

            if megabytes == 2040 and len(tifList) == len(xmlList):
                zipFunction(zipList)
            else:
                continue

eTime = clock()
elapsedTime = eTime - sTime
print "Run time is %s seconds"%(elapsedTime)

The only thing I can think of is that there is never an instance where my variable megabytes==2040 exactly. I can't figure out how to make the code stop at that point otherwise though; I wonder if using a range would work? I also tried:

    if megabytes < 2040:
       zipList.append(full) 
       continue 
    elif megabytes == 2040:
       zipFunction(zipList)

Upvotes: 1

Views: 2895

Answers (1)

PM 2Ring
PM 2Ring

Reputation: 55479

Your main problem is that you need to reset your file size tally when you archive the current list of files. Eg

if megabytes >= 2040:
    zipFunction(zipList)
    totalSize = 0

BTW, you don't need

else:
    continue 

there, since it's the end of the loop.

As for the constraint that you need to keep files together that have the same main file name but different extensions, the only fool-proof way to do that is to sort the file names before processing them.

If you want to guarantee that the total file size in each archive is under the limit you need to test the size before you add the file(s) to the list. Eg,

if (totalSize + fileSize) // 1048576 > 2040:
    zipFunction(zipList)
    totalsize = 0

totalSize += fileSize

That logic will need to be modified slightly to handle keeping a group of files together: you'll need to add the filesizes of each file in the group together into a sub-total, and then see if adding that sub-total to totalSize takes it over the limit.

Upvotes: 2

Related Questions