Sachin Chauhan
Sachin Chauhan

Reputation: 85

Convert Html to PDF Python/Django on Unix Platform

I’m working on a functionality where I need to convert a huge html file (size more than 1 mb) into pdf. I’ve tried below two open-source python libraries. 1. Xhtml2pdf (Pisa) 2. Weasyprint

But none of them solves my problem as they take around 4-5 mins in generating 1 MB PDF file (around 500 pages) causing my app server’s worker process (Gunicorn and Nginx) to get down and throwing ‘GATEWAY TIMEOUT ERROR’ on browser. CPU utilization also goes up to 100% while PDF conversion is in process.

If anybody is having any idea which API/library will be a best suit for large html files.

Upvotes: 0

Views: 408

Answers (2)

bruno desthuilliers
bruno desthuilliers

Reputation: 77902

Generating a 500 pages PDF will take time whatever technologie you use, so the solution is to send the job to an async task queue (celery, huey, django-queue, ...), eventually with some polling to show a progressbar. Even if you manage to optimize the crap out of the generation process, it will STILL takes too much time to fit in an HTTP request/response cycle (from the user's POV at least even one minute is already way to long)

NB : having your CPU maxing out is nothing surprising either - generating a huge PDF not only takes time, it's also a computation-heavy process, and one that easily eats your memory too. This by itself is another reason to use a distributed task queue so you can run the process on a distinct node and avoid killing your front server).

Upvotes: 1

Constantine Ketskalo
Constantine Ketskalo

Reputation: 621

It's just a guess, I never used it, but I found this answer: C++ Library to Convert HTML to PDF? And as far as I know there is Cython, which can be used to combine C/C++ and Python. Probably that will speed things up.

Otherwise you would need to either break it into small peices and merge them or do something with timeout parameter inside classes, that are responsible for it, but this has to be done on both sides - server and client. But I guess you would need to calculate it dynamically depending on file size and needed time and it doesn't sound to me like best desicion, but just in case...

Upvotes: 0

Related Questions