yadavankit
yadavankit

Reputation: 353

Large PDFs taking exponentially longer time with ReportLab

I am using ReportLab for generating PDF Reports and below is the code for the same. The problem is, for X number of pages, it takes T time, but for 2X pages, it takes a lot more than 2T time. Since I have a need to generate PDFs that may go up to 35000 pages, it is a big hassle. What can I do to circumvent around this issue.

from reportlab.platypus import TableStyle, SimpleDocTemplate, LongTable, Table
from reportlab.lib.pagesizes import letter

class JournalPDFGenerator(object):
    """
    Generates Journal PDF with ReportLab
    """

    def __init__(self, pdf_name, profile_report_id):
        self.pdf_name = pdf_name
        self.profile_report_id = profile_report_id
        self.profile_report = ProfileWatchReport.objects.get(id=self.profile_report_id)
        self.document = SimpleDocTemplate(self.pdf_name, pagesize=letter)
        self.story = []

    def get_prepared_rows(self):
        row = [your_mark_details, threat_mark_details]
        yield row

    def generate_pdf(self):
        report_table = LongTable([row for row in self.get_prepared_rows()])
        self.story.append(report_table)
        self.document.build(self.story)

Upvotes: 3

Views: 1619

Answers (2)

DShost
DShost

Reputation: 483

I spent a lot of time to find the cause of the problem above. Instead of LongTable you can try to use my BigDataTable class, optimized for processing big data.

GIST BigDataTable faster LongTable on the big data

Tested with 6500 rows and 7 columns:

  • LongTable: > 1 hour of total document build time processing
  • BigDataTable: ~ 24.2 seconds of total document build time processing

Upvotes: 2

Endre Both
Endre Both

Reputation: 5740

35k pages is not exactly mainstream PDF use, so any glitches are not entirely unexpected. A few ideas to explore:

  • It could simply be that the machine simply runs out of RAM dealing with the amount of data, and a hardware upgrade would help.
  • You could try splitting the data into several tables rather than one big one to see if this improves performance.
  • Would it be possible to split the content either temporarily (to be stitched back together into one file with a different tool like GhostScript) or permanently into several files?
  • Would it be possible to handle pagination yourself (e.g. if the length of the content elements is predictable)? It may (or may not) be that pagination of very large tables gets out of hand.
  • You could try testing a different data structure than LongTable that runs over the same length to check if the problem is related to that particular structure; if it is; you might be able to find an alternative.
  • Finally (or firstly, depending on your inclination), you could look into the relevant code and/or raise an issue with the ReportLab team.

Upvotes: 0

Related Questions