Autom8
Autom8

Reputation: 403

How to print the next line in Python with text extracted using pdfplumber

How can I print the next line from the text that I extracted from a PDF using pdfPlumber extract.text function?

I have tried line.next() but it does not work.

The actual job name is on the line after the "Job Name". As per example below.

Job Name

Albany Mall Development

My code as per below.

jobName_re = re.compile(r'(Job Name)')
siteAddress_re = re.compile(r'(Wellington\s)(.+)')
file = 'invoices.pdf'

lines = []

with pdfplumber.open(file) as myPdf:
    for page in myPdf.pages:
        text = page.extract_text()
        for line in text.split('\n'):
            jobName = jobName_re.search(line)
            siteAddress = siteAddress_re.search(line)
            if jobName:
                print('The next line that follows Job Name is', line.next())
            elif siteAddress:
                print(siteAddress.group(1))

Upvotes: 1

Views: 1638

Answers (1)

Matthew Strawbridge
Matthew Strawbridge

Reputation: 20640

You have several options.

Option 1

You could switch to using an integer index to loop through the records:

lines = text.split('\n')
for i in range(len(lines)):
    line = lines[i]

Then you can access lines[i+1].

Option 2

Set a flag to say you've seen the heading for job name, then pick it up next time round the loop. Something like this:

        last_was_job_heading = False
        for line in text.split('\n'):
            siteAddress = siteAddress_re.search(line)
            if last_was_job_heading:
                print('The next line that follows Job Name is', line)
            elif siteAddress:
                print(siteAddress.group(1))
            last_was_job_heading = jobName_re.search(line)

Option 3

Don't split the text into lines at all. Instead use smarter regular expressions to parse multiple lines at once.

Option 4

Use a parsing library of some sort instead of regular expressions. That's probably overkill in this simple case.

Upvotes: 1

Related Questions