Ehvince
Ehvince

Reputation: 18385

Weasyprint pdf generation, too long, makes the download impossible. How to stream its creation?

I've been using Weasyprint for pdf generation successfully, until I reach a certain size, a common use case of my app, where the pdf generation takes so long (more than 10s) that it breaks the connectivity with the browser, and the download is impossible.

I suppose I must stream the file creation and return a django StreamingHttpResponse (agree ?). I wouldn't pre-process the pdf because it is formed from baskets with items users frequently add or delete.

But how can I stream the file creation with weasyprint ? Even if I cut my sourceHtml string in parts, how to write the pdf step by step ?

I render a django template and generate the pdf from it:

from weasyprint import HTML

sourceHtml = template.render(my-objects)
outhtml = HTML(string=sourceHtml).write_pdf()

response = HttpResponse(outhtml, content_type='application/pdf')
response['Content-Disposition'] = u'attachment; filename="{}.pdf"'.format(name)

Or is there another way to solve this problem ?

Thanks !

Upvotes: 0

Views: 5343

Answers (2)

usersina
usersina

Reputation: 1793

I did not need to implement streaming, but at least this is good enough to guarantee that my main application still works as intended

import asyncio
import concurrent.futures

from weasyprint import HTML


async def generate_pdf(html_content: str, base_url: str) -> bytes:
    """Generate PDF from HTML content.

    Args:
        html_content (str): The HTML content to convert to PDF.
        base_url (str): The base URL for the links in the HTML content.

    Raises:
        ValueError: If the PDF generation fails.

    Returns:
        bytes: The PDF content.
    """

    # Function to generate PDF
    def _generate_pdf():
        return HTML(string=html_content, base_url=base_url).write_pdf()

    # Run the PDF generation in a separate thread using asyncio
    loop = asyncio.get_event_loop()
    with concurrent.futures.ThreadPoolExecutor() as executor:
        pdf_content = await loop.run_in_executor(executor, _generate_pdf)

    if pdf_content is None:
        raise ValueError("PDF generation failed, returned None")

    return pdf_content

I use it as follows

pdf_content = await generate_pdf(html_content, str(request.base_url))
# attachment; # Download
# inline; # Open in browser
headers = {"Content-Disposition": f"inline; filename={filename}.pdf"}
return Response(
    headers=headers, content=pdf_content, media_type="application/pdf"
)

Upvotes: 0

Ehvince
Ehvince

Reputation: 18385

I asked on the issue tracker: https://github.com/Kozea/WeasyPrint/issues/416

It is not doable and a suggested workaround is to

split the download into two steps: one route asynchronously generates the document and stores it on the filesystem, the second route downloads the generated document. When the document is not generated yet, you can hide the second link and display something like "the document is not generated yet" instead.

Upvotes: 1

Related Questions