Reputation: 33
I am attempting to use os.walk to create a list of files per subdirectory, and, execute a function to merge all pdf's in each directory list. The current script appends subsequent directories to the existing list with each loop. So, pdfs in directory1 are merged successfully, but, the list for directory2 includes the pdfs from directory1 etc. I want it to refresh the list of files for each directory. Here is the script I am using currently:
import PyPDF2
import os
import sys
if len(sys.argv) > 1:
SearchDirectory = sys.argv[1]
print("I'm looking for PDF's in ", SearchDirectory)
else:
print("Please tell me the directory to look in")
sys.exit()
pdfWriter = PyPDF2.PdfFileWriter()
for root, dirs, files in os.walk(SearchDirectory):
dirs.sort()
for file in files:
files.sort()
pdfFiles = []
if file.endswith('.pdf') and ((os.path.basename(root)) == "frames"):
print("Discovered this pdf: ", os.path.join(root, file))
pdfFiles.append(os.path.join(root, file))
if pdfFiles:
for file in pdfFiles:
pdfFileObj = open(file, 'rb')
pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
for pageNum in range(0, pdfReader.numPages):
pageObj = pdfReader.getPage(pageNum)
pdfWriter.addPage(pageObj)
pdfOutput = open((os.path.split(os.path.realpath(root))[0]) + ".pdf", "wb")
pdfWriter.write(pdfOutput)
pdfOutput.close()
print("The following pdf has been successfully appended:", os.path.join(root, file))
else:
print("No pdfs found in this directory:", root)
Upvotes: 1
Views: 200
Reputation: 338148
The os.walk
loop iterates once per directory. So you want to create a new PDFWriter for every directory.
It's also a good idea to use continue
to bail out of the loop as soon as possible, this keeps the nesting flat.
Names that start with a capital letter are reserved for classes, so it should be searchDirectory
, written with a small s
.
Finally, take advantage of with
blocks for handling I/O - they automatically call .close()
for you.
I'm not going to install PyPDF2
just for this question, but this approach looks reasonable:
for root, dirs, files in os.walk(searchDirectory):
if not os.path.basename(root) == "frames":
continue
pdfFiles = [os.path.join(root, file) for file in sorted(files)]
if not pdfFiles:
continue
pdfWriter = PyPDF2.PdfFileWriter()
outputFile = os.path.split(os.path.realpath(root))[0] + ".pdf"
for file in pdfFiles:
print("Discovered this pdf:", file)
with open(file, 'rb') as pdfInput:
pdfReader = PyPDF2.PdfFileReader(pdfInput)
for page in pdfReader.pages:
pdfWriter.addPage(page)
with open(outputFile, "wb") as pdfOutput:
pdfWriter.write(pdfOutput)
print("%s files appended to %s" % (len(pdfFiles), outputFile))
Upvotes: 1