Reputation: 2866
Is there any way to use Python to create PDF documents from HTML/CSS/Javascript, without introducing any OS-level dependencies?
It seems every existing solution requires special supplemental software, but upon reviewing PDF formatting specifications and HTML/CSS/Javascript rendering, there doesn't appear to be a reason why a Python solution can't exist without them. Some solutions come close, such as pyppeteer, but it still leans on a headless Chrome installation locally. These dependencies mean that microservices can't be leveraged, even though PDF generation would otherwise seem to be a viable use case for them.
While similar questions have come up many times over on SO, there doesn't appear to have been a viable technique shown without having to install specialized dependencies on the OS.
Some similar questions which routinely recommend wkhtmltopdf
or are otherwise out of date (e.g., moving PDF printing support outside of Chrome is dead now):
If I've somehow missed a viable approach, please feel free to mark this as a duplicate with my thanks!
Edit February 2021: It appears that the cefpython
project may meet these demands - PDF printing support seems like it could be implemented in the near future.
Edit November 2023: Still no solutions for this, and cefpython
appears to be a dead project now. If leveraging microservices, it's probably worth considering a native Node environment for this use case instead of Python.
Upvotes: 9
Views: 12577
Reputation: 4269
So to clarify and formalize what others have said:
If you want to create PDF documents from HTML/CSS/javascript content, you will necessarily need a javascript engine (because you obviously need to execute the javascript if it affects the visuals of the document). This is the most complex component that you need.
As for now, there is no ECMAscript compliant engine written in pure python that is well-maintained (that would be a huge project)... There will probably never be one, since compilers and VMs for languages need to be performant and are thus usually written in a performant low-level language.
So you will always need compiled binaries for that and the HTML renderers which are less complex but also need to be performant if used in browsers, so usually they're also C++ or the likes.
The javascript engine and HTML renderer are the major part of a browser, so a headless browser is a good solution to this requirement.
Upvotes: 4
Reputation: 41
Try this library: xhtml2pdf
It worked for me. Here is the documentation: doc
Some sample code:
from xhtml2pdf import pisa
def convert_html_to_pdf(source_html, output_filename):
# open output file for writing (truncated binary)
result_file = open(output_filename, "w+b")
# convert HTML to PDF
pisa_status = pisa.CreatePDF(
source_html, # the HTML to convert
dest=result_file) # file handle to recieve result
# close output file
result_file.close() # close output file
# return False on success and True on errors
return pisa_status.err
# Define your data
source_html = open('2020-06.html')
output_filename = "test.pdf"
convert_html_to_pdf(source_html, output_filename)
Upvotes: 4