bsplosion
bsplosion

Reputation: 2866

Creating PDFs from HTML/Javascript in Python with no OS dependencies

Is there any way to use Python to create PDF documents from HTML/CSS/Javascript, without introducing any OS-level dependencies?

It seems every existing solution requires special supplemental software, but upon reviewing PDF formatting specifications and HTML/CSS/Javascript rendering, there doesn't appear to be a reason why a Python solution can't exist without them. Some solutions come close, such as pyppeteer, but it still leans on a headless Chrome installation locally. These dependencies mean that microservices can't be leveraged, even though PDF generation would otherwise seem to be a viable use case for them.

While similar questions have come up many times over on SO, there doesn't appear to have been a viable technique shown without having to install specialized dependencies on the OS.

Some similar questions which routinely recommend wkhtmltopdf or are otherwise out of date (e.g., moving PDF printing support outside of Chrome is dead now):

If I've somehow missed a viable approach, please feel free to mark this as a duplicate with my thanks!

Edit February 2021: It appears that the cefpython project may meet these demands - PDF printing support seems like it could be implemented in the near future.

Edit November 2023: Still no solutions for this, and cefpython appears to be a dead project now. If leveraging microservices, it's probably worth considering a native Node environment for this use case instead of Python.

Upvotes: 9

Views: 12577

Answers (2)

reverse_engineer
reverse_engineer

Reputation: 4269

So to clarify and formalize what others have said:

  • If you want to create PDF documents from HTML/CSS/javascript content, you will necessarily need a javascript engine (because you obviously need to execute the javascript if it affects the visuals of the document). This is the most complex component that you need.

  • As for now, there is no ECMAscript compliant engine written in pure python that is well-maintained (that would be a huge project)... There will probably never be one, since compilers and VMs for languages need to be performant and are thus usually written in a performant low-level language.

  • So you will always need compiled binaries for that and the HTML renderers which are less complex but also need to be performant if used in browsers, so usually they're also C++ or the likes.

  • The javascript engine and HTML renderer are the major part of a browser, so a headless browser is a good solution to this requirement.

Upvotes: 4

Acaelesto
Acaelesto

Reputation: 41

Try this library: xhtml2pdf

It worked for me. Here is the documentation: doc

Some sample code:

from xhtml2pdf import pisa             

def convert_html_to_pdf(source_html, output_filename):
    # open output file for writing (truncated binary)
    result_file = open(output_filename, "w+b")

    # convert HTML to PDF
    pisa_status = pisa.CreatePDF(
            source_html,                # the HTML to convert
            dest=result_file)           # file handle to recieve result

    # close output file
    result_file.close()                 # close output file

    # return False on success and True on errors
    return pisa_status.err

# Define your data
source_html = open('2020-06.html')
output_filename = "test.pdf"
convert_html_to_pdf(source_html, output_filename)

Upvotes: 4

Related Questions