Reputation: 403
How can I print the next line from the text that I extracted from a PDF using pdfPlumber extract.text function?
I have tried line.next() but it does not work.
The actual job name is on the line after the "Job Name". As per example below.
Job Name
Albany Mall Development
My code as per below.
jobName_re = re.compile(r'(Job Name)')
siteAddress_re = re.compile(r'(Wellington\s)(.+)')
file = 'invoices.pdf'
lines = []
with pdfplumber.open(file) as myPdf:
for page in myPdf.pages:
text = page.extract_text()
for line in text.split('\n'):
jobName = jobName_re.search(line)
siteAddress = siteAddress_re.search(line)
if jobName:
print('The next line that follows Job Name is', line.next())
elif siteAddress:
print(siteAddress.group(1))
Upvotes: 1
Views: 1638
Reputation: 20640
You have several options.
You could switch to using an integer index to loop through the records:
lines = text.split('\n')
for i in range(len(lines)):
line = lines[i]
Then you can access lines[i+1]
.
Set a flag to say you've seen the heading for job name, then pick it up next time round the loop. Something like this:
last_was_job_heading = False
for line in text.split('\n'):
siteAddress = siteAddress_re.search(line)
if last_was_job_heading:
print('The next line that follows Job Name is', line)
elif siteAddress:
print(siteAddress.group(1))
last_was_job_heading = jobName_re.search(line)
Don't split the text into lines at all. Instead use smarter regular expressions to parse multiple lines at once.
Use a parsing library of some sort instead of regular expressions. That's probably overkill in this simple case.
Upvotes: 1