Em Clar
Em Clar

Reputation: 3

How can I remove the first page of multiple pdf files in a directory? PYTHON

I need to remove the first page of multiple pdf files in a directory. I am an elementary level python user and I have cobbled together the following code from bits & pieces of other code that I have. However, I cannot get it to work. Does anything jump out at anyone?

from PyPDF2 import PdfFileWriter, PdfFileReader

import os, sys

directory_name = 'emma'


for filename in directory_name:
    print 'name: %s' % filename

    output_file = PdfFileWriter()
    input_handle = open(filename+'.pdf', 'rb')
    input_file = PdfFileReader(input_handle)

    num_pages = input_file.getNumPages()

    print "document has %s pages \n" % num_pages

    for i in xrange(1, num_pages):
        output_file.addPage(input_file.getPage(i))
        print 'added page %s \n' % i

    output_stream = file(filename+'-stripped.pdf','wb')
    output_file.write(output_stream)

    output_stream.close()
    input_handle.close()

Error message:

    input_handle = open(filename+'.pdf', 'rb')
        IOError: [Errno 2] No such file or directory: 'a.pdf'

Upvotes: 0

Views: 2518

Answers (2)

Philipp Stark
Philipp Stark

Reputation: 1

I adapted the code to Python 3, just in case somebody wants to use it:

from PyPDF2 import PdfWriter, PdfReader 

import os, glob, sys

os.chdir(r'data_path')
filename_lst = glob.glob('*.pdf')
print('number of files: {}'.format(len(filename_lst)))

save_path = '...' # if you want to save the results somewhere else

for filename in filename_lst:
    print('name: {}'.format(filename))

    output_file = PdfWriter()
    input_handle = open(filename, 'rb')
    input_file = PdfReader (input_handle)

    num_pages = len(input_file.pages)

    print("document has {} pages \n".format(num_pages))

    for i in range(1, num_pages):
        output_file.add_page(input_file.pages[i])

    output_stream = open(save_path + filename, 'wb')
    output_file.write(output_stream)

    output_stream.close()
    input_handle.close()

Upvotes: 0

Thomas Fenzl
Thomas Fenzl

Reputation: 4392

Your code iterates over "emma" and tries to open e.pdf, m.pdf (twice), a.pdf. Your error on a.pdf means the first two actually exist, which is interesting enough on its own.

But to your problem, you need to use os.listdir or glob to actually get the filenames within the directory.

Upvotes: 1

Related Questions