Reputation: 35
I need to access data from pdf form fields. I tried the package PyPDF2 with this code:
import PyPDF2
reader = PyPDF2.PdfReader('formular.pdf')
print(reader.pages[0].extract_text())
But this gives me only the text of the normal pdf data, not the form fields.
Does anyone know how to read text from the form fields?
Upvotes: 2
Views: 7596
Reputation: 221
You can use the getFormTextFields() method to return a dictionary of form fields (see https://pythonhosted.org/PyPDF2/PdfFileReader.html). Use the dictionary keys (the field name) to access the values (the field values).The following example might help:
from PyPDF2 import PdfFileReader
infile = "myInputPdf.pdf"
pdf_reader = PdfFileReader(open(infile, "rb"))
dictionary = pdf_reader.getFormTextFields() # returns a python dictionary
my_field_value = str(dictionary['my_field_name']) # use field name (dictionary key) to access field value (dictionary value)
Upvotes: 4
Reputation: 1544
There are library in python through which you can access pdf
data. As pdf
is not a raw data like csv
, txt
,tsv
etc. So python can't directly read data inside pdf
files.
There is a python library name as slate
Slate documentation. Read this documentation. I hope you will get answer to your question.
Upvotes: 0