Reputation: 295
I have a python script that creates a number of pdf forms (0 - 10) and then concatenates them into one form. The fields on the compiled PDF show up differently in 4 different contexts. I am developing in debian linux, and the pdf viewer (Okular) does not show any fields within the compiled PDF, whereas on Windows 10, if I open the pdf with chrome, I have to hover over the field to see the field value. It has the correct field data for the first page, however, each subsequent page is just a duplicate of the first page, which is incorrect. If I open the pdf with Microsoft Edge, it correctly displays the form data for each page, however when I go to print with edge, none of the form data shows up.
I am using pdfrw for writing to pdf, and pypdf2 for merging. I have tried a number of different things, including attempting to flatten the pdf with python (which there is very little support for btw), reading and writing instead of merging, attempting to convert the form fields into text, along with many other things that I have since forgotten about since they did not work.
def writeToPdf(unfilled, output, data, fields):
'''Function writes the data from data to unfilled, and saves it as output'''
# TODO: Use literal declarations for lists, dicts, etc
checkboxes = [
'misconduct_complete',
'misconduct_incomplete',
'not_final_exam',
'supervise_exam',
'not_final_home_exam',
'not_final_assignment',
'not_final_oral_exam',
'not_final_lab_exam',
'not_final_practical_exam',
'not_final_other'
]
template_pdf = pdfrw.PdfReader(unfilled)
annotations = template_pdf.pages[0][Annot_Key]
for annotation in annotations:
# TODO: Singly nested if's with no else's suggest a logic problem, find a clearer way to do this.
if annotation[Subtype_Key] == Widget_Subtype_Key:
if annotation[Annot_Field_Key]:
key = annotation[Annot_Field_Key][1:-1]
if key in fields:
if key in checkboxes:
annotation.update(pdfrw.PdfDict(AS=pdfrw.PdfName('Yes')))
else:
if(key == 'course'):
annotation.update(pdfrw.PdfDict(V='{}'.format(data[key][0:8])))
else:
annotation.update(pdfrw.PdfDict(V='{}'.format(data[key])))
pdfrw.PdfWriter().write(output, template_pdf)
def set_need_appearances_writer(writer):
# basically used to ensured there are not
# overlapping form fields, which makes printing hard
try:
catalog = writer._root_object
# get the AcroForm tree and add "/NeedAppearances attribute
if "/AcroForm" not in catalog:
writer._root_object.update({
NameObject("/AcroForm"): IndirectObject(len(writer._objects), 0, writer)})
need_appearances = NameObject("/NeedAppearances")
writer._root_object["/AcroForm"][need_appearances] = BooleanObject(True)
except Exception as e:
print('set_need_appearances_writer() catch : ', repr(e))
return writer
def mergePDFs(listOfPdfPaths, outputPDf):
'''Function Merges a list of pdfs into a single one, and saves it to outputPDf'''
pdf_writer = PdfFileWriter()
set_need_appearances_writer(pdf_writer)
pdf_writer.setPageMode('/UseOC')
for path in listOfPdfPaths:
pdf_reader = PdfFileReader(path)
for page in range(pdf_reader.getNumPages()):
pdf_writer.addPage(pdf_reader.getPage(page))
with open(outputPDf, 'wb') as fh:
pdf_writer.write(fh)
As mentioned above, there is different results for different contexts. Within Debian Linux, the okular view shows no forms, within windows 10 google chrome shows duplicate fields after the first page (but I have to hover over/click the field), Microsoft Edge shows the correct with each page having its own field data, and if i look at the print preview, it also shows no form data
Upvotes: 1
Views: 732
Reputation: 295
If anyone else is having this quite obscure problem, the behavior is unspecified for the use case that I was dealing with (template fillable form with the same field names). The only solution that is available with python at the moment (at least that I found in my many hours researching and testing) was to flatten the pdf, create a separate pdf, and write the form data to the desired locations (I did this with reportlab), then to overlay the template pdf with the created pdf. Overall this is not a good solution for many reasons, so if you have a better one, please Post it!
Upvotes: 0