Antonio Kallai
Antonio Kallai

Reputation: 35

How to access data from pdf forms with python?

I need to access data from pdf form fields. I tried the package PyPDF2 with this code:

import PyPDF2

reader = PyPDF2.PdfReader('formular.pdf')
print(reader.pages[0].extract_text())

But this gives me only the text of the normal pdf data, not the form fields.

Does anyone know how to read text from the form fields?

Upvotes: 2

Views: 7596

Answers (2)

tromar
tromar

Reputation: 221

You can use the getFormTextFields() method to return a dictionary of form fields (see https://pythonhosted.org/PyPDF2/PdfFileReader.html). Use the dictionary keys (the field name) to access the values (the field values).The following example might help:

from PyPDF2 import PdfFileReader

infile = "myInputPdf.pdf"
pdf_reader = PdfFileReader(open(infile, "rb"))

dictionary = pdf_reader.getFormTextFields() # returns a python dictionary
my_field_value = str(dictionary['my_field_name']) # use field name (dictionary key) to access field value (dictionary value)

Upvotes: 4

Anurag Misra
Anurag Misra

Reputation: 1544

There are library in python through which you can access pdf data. As pdf is not a raw data like csv, txt,tsv etc. So python can't directly read data inside pdf files.

There is a python library name as slate Slate documentation. Read this documentation. I hope you will get answer to your question.

Upvotes: 0

Related Questions