Reputation: 18385
I've been using Weasyprint for pdf generation successfully, until I reach a certain size, a common use case of my app, where the pdf generation takes so long (more than 10s) that it breaks the connectivity with the browser, and the download is impossible.
I suppose I must stream the file creation and return a django StreamingHttpResponse (agree ?). I wouldn't pre-process the pdf because it is formed from baskets with items users frequently add or delete.
But how can I stream the file creation with weasyprint ? Even if I cut my sourceHtml string in parts, how to write the pdf step by step ?
I render a django template and generate the pdf from it:
from weasyprint import HTML
sourceHtml = template.render(my-objects)
outhtml = HTML(string=sourceHtml).write_pdf()
response = HttpResponse(outhtml, content_type='application/pdf')
response['Content-Disposition'] = u'attachment; filename="{}.pdf"'.format(name)
Or is there another way to solve this problem ?
Thanks !
Upvotes: 0
Views: 5343
Reputation: 1793
I did not need to implement streaming, but at least this is good enough to guarantee that my main application still works as intended
import asyncio
import concurrent.futures
from weasyprint import HTML
async def generate_pdf(html_content: str, base_url: str) -> bytes:
"""Generate PDF from HTML content.
Args:
html_content (str): The HTML content to convert to PDF.
base_url (str): The base URL for the links in the HTML content.
Raises:
ValueError: If the PDF generation fails.
Returns:
bytes: The PDF content.
"""
# Function to generate PDF
def _generate_pdf():
return HTML(string=html_content, base_url=base_url).write_pdf()
# Run the PDF generation in a separate thread using asyncio
loop = asyncio.get_event_loop()
with concurrent.futures.ThreadPoolExecutor() as executor:
pdf_content = await loop.run_in_executor(executor, _generate_pdf)
if pdf_content is None:
raise ValueError("PDF generation failed, returned None")
return pdf_content
I use it as follows
pdf_content = await generate_pdf(html_content, str(request.base_url))
# attachment; # Download
# inline; # Open in browser
headers = {"Content-Disposition": f"inline; filename={filename}.pdf"}
return Response(
headers=headers, content=pdf_content, media_type="application/pdf"
)
Upvotes: 0
Reputation: 18385
I asked on the issue tracker: https://github.com/Kozea/WeasyPrint/issues/416
It is not doable and a suggested workaround is to
split the download into two steps: one route asynchronously generates the document and stores it on the filesystem, the second route downloads the generated document. When the document is not generated yet, you can hide the second link and display something like "the document is not generated yet" instead.
Upvotes: 1