NateH06
NateH06

Reputation: 3574

Convert HTML to PDF - PDFKit File Size too Large

I have one complete (static, it doesn't rely on calls to the internet) HTML file that's < 900 KB in size, and I am currently using PDF Kit to create a single PDF from it that ends up being about 100 pages long. The PDF is 30-40 mB - which is way too large, frankly - considering each page of the PDF is just text and a small image repeated 4 times.

The way I create the PDF is pretty simple.

installation:

apt-get install wkhtmltopdf -y

pip install pdfkit==1.0.0
pip install pypdf2==2.10.5
import pdfkit

def html_to_pdf(html_path: str, pdf_path: str):
    pdfkit.from_file(
        input=html_path, 
        output_path=pdf_path,
        configuration=pdfkit.configuration(),
        options={
            'zoom': '0.9588', # seemed to be the right zoom through trial and error
            'disable-smart-shrinking': '', 
            'page-size': 'Letter',
            'orientation': 'Landscape',
            'margin-top': '0',
            'margin-right': '0',
            'margin-left': '0',
            'margin-bottom': '0',
            'encoding': "UTF-8",
        })

html_to_pdf(".my_html_file.html", "my_pdf_file.pdf")

The image I pull in - I've tried resizing the image and shrunk it to be about 30% of its original size, but there was no change at all in the size of the resulting .pdf.

What I notice about the PDF's I generate with PDFKit is that it's not really a PDF. As in - you can't really search the text, highlight text blocks, etc. It acts like it's essentially a big image on every page. When I do a print from my browser on the HTML and convert that to a PDF - I can do all those things for example.

I am stuck building something programmatically - so I need this to be automated. Is there some setting I'm missing with PDF Kit?

Also what can be noted is I have access to the actual string I use to make the HTML - I don't have to read an HTML file. Would that make a difference?

I'm also open to not using PDF kit at all. I just need something that doesn't require a license.

======

Addendum - it's fonts!

I use custom fonts that are .otf files, so I run some code that turns the file into binary. I then store the font as its byte string in the <head> like this -

<style>
@font-face {
    font-family: "my_custom_font";
    src: url(data:font/woff2;base64, asdlfjsads92932super-long-byte-string-here) format("woff2");
    font-weight: normal;
    font-style: normal
}</style>

<style>
@font-face {
    font-family: "my_custom_font_bold";
    src: url(data:font/woff2;base64, asdlfjsads92932super-long-byte-string-here) format("woff2");
    font-weight: normal;
    font-style: normal
}</style>

I refer to them in the CSS and apply them to my elements like this:

    span {
        font-family: "my_custom_font", Helvetica, sans-serif;
    }

    b {
        font-family: "my_custom_font_bold";
    }

And then the <body> of my html will contain this structure, repeated over and over again:

<div class=offer>
    <div class="offer_banner">
        <div class=text_container>
            <div class="stateroom_text"><span>Deliver to you</span></div>
        </div>
        <div class=text_container>
            <div class=colored_heading>
                <div class=colored_heading_child><span class=colored_heading_text>My other text</span></div>
            </div>
        </div>
    </div>
    <div class="offer_top_content">
        <div class=text_container>
            <div class=greeting_text><span>Text 1</span></div>
        </div>
        <hr>
        <div class=text_container>
            <div class=offer_text><span>text 2</span></div>
        </div>
        <hr>
        <div class=text_container>
            <div class=redemption_text><span>Text 3</span></div>
        </div>
        <div class=text_container>
            <div><span class="italics_span">Text 4</span></div>
        </div>
    </div>
    <div class=logo_container>
        <div style="margin:0 20px 0 20px;text-align:center"><img
                src="data:image/svg+xml,%3Csvg%20xmlns=%27http%3A//www.w3.org/2000/svg%27%20width=%27128%27%20height=%2730%27%3E%3Crect%20fill-opacity=%270%27/%3E%3C/svg%3E"
                alt=""
                style="background-blend-mode:normal!important; background-clip:content-box!important; background-position:50% 50%!important; background-color:rgba(0,0,0,0)!important; background-image:var(--sf-img-5)!important; background-size:100% 100%!important; background-origin:content-box!important; background-repeat:no-repeat!important">
        </div>
    </div>
</div>

Is there a way I can still use my custom fonts and get PDFKit to print it like normal text?

Upvotes: 0

Views: 226

Answers (0)

Related Questions