Tyler Watson
Tyler Watson

Reputation: 13

How do I iterate through files in my directory so they can be opened/read using PyPDF2?

I am working on an invoice scraper for work, where I have successfully written all the code to scrape the fields that I need using PyPDF2. However, I am having trouble figuring out how to put this code into a for loop so I can iterate through all the invoices stored in my directory. There could be anywhere from 1 to 250+ files depending on which project I am using this for.

I thought I would be able to use "*.pdf" in place of the pdf name, but it does not work for me. I am relatively new to Python and have not used that many loops before, so any guidance would be appreciated!

import re

pdfFileObj = open(r'C:\Users\notylerhere\Desktop\Test Invoices\SampleInvoice.pdf', 'rb')
pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
pageObj = pdfReader.getPage(0)

#Print all text on page
#print(pageObj.extractText())

#Grab Account Number Meter Number
accountNumber = re.compile(r'\d\d\d\d\d-\d\d\d\d\d')
meterNumber = re.compile(r'(\d\d\d\d\d\d\d\d)')
moAccountNumber = accountNumber.search(pageObj.extractText())
moMeterNumber = meterNumber.search(pageObj.extractText())
print('Account Number: '+moAccountNumber.group())
print('Meter Number: '+moMeterNumber.group(1))'''

Thanks very much! 

Upvotes: 0

Views: 583

Answers (3)

Employee
Employee

Reputation: 3233

import os
import PyPDF2

for el in os.listdir(os.getcwd()): 
    if el.endswith("pdf"): 
        pdf_reader = PyPDF2.PdfFileReader(open(os.getcwd() + "/" + el))

Upvotes: 0

Tom
Tom

Reputation: 86

Another option is glob:

import glob

files = glob.glob("c:/mydirectory/*.pdf")

for file in files:

    (Do your processing of file here)

You need to ensure everything past the colon is properly indented.

Upvotes: 1

Lukas Schmid
Lukas Schmid

Reputation: 1960

You want to iterate over your directory and deal with every file independently.

There are many functions depending on your use case. os.walk is a good place to start.

Example:

import os
for root, directories, files in os.walk('.'):
  for file in files:
    if '.pdf' in file:
      openAndDoStuff(file)

Upvotes: 0

Related Questions