Reputation: 311
How can I fill a PDF file with forms with data and "flatten" it?
I use pdftk at the moment, but it does not process national characters correctly.
Is there any Python library or example how to fill PDF forms and render it to a non-editable PDF file?
Upvotes: 21
Views: 54538
Reputation: 136359
Straight from the pypdf docs (added years after the question was asked):
from pypdf import PdfReader, PdfWriter
reader = PdfReader("form.pdf")
writer = PdfWriter()
page = reader.pages[0]
fields = reader.get_fields()
writer.add_page(page)
writer.update_page_form_field_values(
writer.pages[0], {"fieldname": "some filled in text"}
)
with open("filled-out.pdf", "wb") as output_stream:
writer.write(output_stream)
Upvotes: 29
Reputation: 2813
You do not need a library per say to flatten the PDF, per the Adobe Docs, you can change the Bit Position of the Editable Form Fields to 1 to make the field ReadOnly. I provided a full solution here, but it uses Django:
https://stackoverflow.com/a/55301804/8382028
Adobe Docs (page 441):
https://opensource.adobe.com/dc-acrobat-sdk-docs/standards/pdfstandards/pdf/PDF32000_2008.pdf
Use PyPDF2 to fill the fields, then loop through the annotations to change the bit position:
from io import BytesIO
import PyPDF2
from PyPDF2.generic import BooleanObject, NameObject, IndirectObject, NumberObject
# open the pdf
input_stream = open("YourPDF.pdf", "rb")
pdf_reader = PyPDF2.PdfFileReader(input_stream, strict=False)
if "/AcroForm" in pdf_reader.trailer["/Root"]:
pdf_reader.trailer["/Root"]["/AcroForm"].update(
{NameObject("/NeedAppearances"): BooleanObject(True)})
pdf_writer = PyPDF2.PdfFileWriter()
set_need_appearances_writer(pdf_writer)
if "/AcroForm" in pdf_writer._root_object:
# Acro form is form field, set needs appearances to fix printing issues
pdf_writer._root_object["/AcroForm"].update(
{NameObject("/NeedAppearances"): BooleanObject(True)})
data_dict = dict() # this is a dict of your DB form values
pdf_writer.addPage(pdf_reader.getPage(0))
page = pdf_writer.getPage(0)
# update form fields
pdf_writer.updatePageFormFieldValues(page, data_dict)
for j in range(0, len(page['/Annots'])):
writer_annot = page['/Annots'][j].getObject()
for field in data_dict:
if writer_annot.get('/T') == field:
writer_annot.update({
NameObject("/Ff"): NumberObject(1) # make ReadOnly
})
output_stream = BytesIO()
pdf_writer.write(output_stream)
# output_stream is your flattened PDF
def set_need_appearances_writer(writer):
# basically used to ensured there are not
# overlapping form fields, which makes printing hard
try:
catalog = writer._root_object
# get the AcroForm tree and add "/NeedAppearances attribute
if "/AcroForm" not in catalog:
writer._root_object.update({
NameObject("/AcroForm"): IndirectObject(len(writer._objects), 0, writer)})
need_appearances = NameObject("/NeedAppearances")
writer._root_object["/AcroForm"][need_appearances] = BooleanObject(True)
except Exception as e:
print('set_need_appearances_writer() catch : ', repr(e))
return writer
Upvotes: 1
Reputation: 55
We can also consider using API instead of importing packages to work with PDF. This way has it's own advantages/disadvantages, but hey it gives us new perspective to enhance our applications!
One of example is to use PDF.co API to fill PDF forms. You can also consider other alternatives such as Adobe API, DocSpring, pdfFiller, etc. Following code snippet might be useful where it demonstrates filling PDF form using predefined JSON payload.
import os
import requests # pip install requests
# The authentication key (API Key).
# Get your own by registering at https://app.pdf.co/documentation/api
API_KEY = "**************************************"
# Base URL for PDF.co Web API requests
BASE_URL = "https://api.pdf.co/v1"
def main(args = None):
fillPDFForm()
def fillPDFForm():
"""Fill PDF form using PDF.co Web API"""
# Prepare requests params as JSON
# See documentation: https://apidocs.pdf.co
payload = "{\n \"async\": false,\n \"encrypt\": false,\n \"name\": \"f1040-filled\",\n \"url\": \"https://bytescout-com.s3-us-west-2.amazonaws.com/files/demo-files/cloud-api/pdf-form/f1040.pdf\",\n \"fields\": [\n {\n \"fieldName\": \"topmostSubform[0].Page1[0].FilingStatus[0].c1_01[1]\",\n \"pages\": \"1\",\n \"text\": \"True\"\n },\n {\n \"fieldName\": \"topmostSubform[0].Page1[0].f1_02[0]\",\n \"pages\": \"1\",\n \"text\": \"John A.\"\n }, \n {\n \"fieldName\": \"topmostSubform[0].Page1[0].f1_03[0]\",\n \"pages\": \"1\",\n \"text\": \"Doe\"\n }, \n {\n \"fieldName\": \"topmostSubform[0].Page1[0].YourSocial_ReadOrderControl[0].f1_04[0]\",\n \"pages\": \"1\",\n \"text\": \"123456789\"\n },\n {\n \"fieldName\": \"topmostSubform[0].Page1[0].YourSocial_ReadOrderControl[0].f1_05[0]\",\n \"pages\": \"1\",\n \"text\": \"Joan B.\"\n },\n {\n \"fieldName\": \"topmostSubform[0].Page1[0].YourSocial_ReadOrderControl[0].f1_05[0]\",\n \"pages\": \"1\",\n \"text\": \"Joan B.\"\n },\n {\n \"fieldName\": \"topmostSubform[0].Page1[0].YourSocial_ReadOrderControl[0].f1_06[0]\",\n \"pages\": \"1\",\n \"text\": \"Doe\"\n },\n {\n \"fieldName\": \"topmostSubform[0].Page1[0].YourSocial_ReadOrderControl[0].f1_07[0]\",\n \"pages\": \"1\",\n \"text\": \"987654321\"\n } \n\n\n\n ],\n \"annotations\":[\n {\n \"text\":\"Sample Filled with PDF.co API using /pdf/edit/add. Get fields from forms using /pdf/info/fields\",\n \"x\": 10,\n \"y\": 10,\n \"size\": 12,\n \"pages\": \"0-\",\n \"color\": \"FFCCCC\",\n \"link\": \"https://pdf.co\"\n }\n ], \n \"images\": [ \n ]\n}"
# Prepare URL for 'Fill PDF' API request
url = "{}/pdf/edit/add".format(BASE_URL)
# Execute request and get response as JSON
response = requests.post(url, data=payload, headers={"x-api-key": API_KEY, 'Content-Type': 'application/json'})
if (response.status_code == 200):
json = response.json()
if json["error"] == False:
# Get URL of result file
resultFileUrl = json["url"]
# Download result file
r = requests.get(resultFileUrl, stream=True)
if (r.status_code == 200):
with open(destinationFile, 'wb') as file:
for chunk in r:
file.write(chunk)
print(f"Result file saved as \"{destinationFile}\" file.")
else:
print(f"Request error: {response.status_code} {response.reason}")
else:
# Show service reported error
print(json["message"])
else:
print(f"Request error: {response.status_code} {response.reason}")
if __name__ == '__main__':
main()
Upvotes: -2
Reputation: 465
Give the fillpdf library a try, it makes this process very simple (pip install fillpdf
and poppler dependency conda install -c conda-forge poppler
)
Basic usage:
from fillpdf import fillpdfs
fillpdfs.get_form_fields("blank.pdf")
# returns a dictionary of fields
# Set the returned dictionary values a save to a variable
# For radio boxes ('Off' = not filled, 'Yes' = filled)
data_dict = {
'Text2': 'Name',
'Text4': 'LastName',
'box': 'Yes',
}
fillpdfs.write_fillable_pdf('blank.pdf', 'new.pdf', data_dict)
# If you want it flattened:
fillpdfs.flatten_pdf('new.pdf', 'newflat.pdf')
More info here: https://github.com/t-houssian/fillpdf
Seems to fill very well.
See this answer here for more info: https://stackoverflow.com/a/66809578/13537359
Upvotes: 14