Reputation: 3574
I have one complete (static, it doesn't rely on calls to the internet) HTML file that's < 900 KB in size, and I am currently using PDF Kit to create a single PDF from it that ends up being about 100 pages long. The PDF is 30-40 mB - which is way too large, frankly - considering each page of the PDF is just text and a small image repeated 4 times.
The way I create the PDF is pretty simple.
installation:
apt-get install wkhtmltopdf -y
pip install pdfkit==1.0.0
pip install pypdf2==2.10.5
import pdfkit
def html_to_pdf(html_path: str, pdf_path: str):
pdfkit.from_file(
input=html_path,
output_path=pdf_path,
configuration=pdfkit.configuration(),
options={
'zoom': '0.9588', # seemed to be the right zoom through trial and error
'disable-smart-shrinking': '',
'page-size': 'Letter',
'orientation': 'Landscape',
'margin-top': '0',
'margin-right': '0',
'margin-left': '0',
'margin-bottom': '0',
'encoding': "UTF-8",
})
html_to_pdf(".my_html_file.html", "my_pdf_file.pdf")
The image I pull in - I've tried resizing the image and shrunk it to be about 30% of its original size, but there was no change at all in the size of the resulting .pdf.
What I notice about the PDF's I generate with PDFKit is that it's not really a PDF. As in - you can't really search the text, highlight text blocks, etc. It acts like it's essentially a big image on every page. When I do a print from my browser on the HTML and convert that to a PDF - I can do all those things for example.
I am stuck building something programmatically - so I need this to be automated. Is there some setting I'm missing with PDF Kit?
Also what can be noted is I have access to the actual string I use to make the HTML - I don't have to read an HTML file. Would that make a difference?
I'm also open to not using PDF kit at all. I just need something that doesn't require a license.
======
Addendum - it's fonts!
I use custom fonts that are .otf
files, so I run some code that turns the file into binary. I then store the font as its byte string in the <head>
like this -
<style>
@font-face {
font-family: "my_custom_font";
src: url(data:font/woff2;base64, asdlfjsads92932super-long-byte-string-here) format("woff2");
font-weight: normal;
font-style: normal
}</style>
<style>
@font-face {
font-family: "my_custom_font_bold";
src: url(data:font/woff2;base64, asdlfjsads92932super-long-byte-string-here) format("woff2");
font-weight: normal;
font-style: normal
}</style>
I refer to them in the CSS and apply them to my elements like this:
span {
font-family: "my_custom_font", Helvetica, sans-serif;
}
b {
font-family: "my_custom_font_bold";
}
And then the <body>
of my html will contain this structure, repeated over and over again:
<div class=offer>
<div class="offer_banner">
<div class=text_container>
<div class="stateroom_text"><span>Deliver to you</span></div>
</div>
<div class=text_container>
<div class=colored_heading>
<div class=colored_heading_child><span class=colored_heading_text>My other text</span></div>
</div>
</div>
</div>
<div class="offer_top_content">
<div class=text_container>
<div class=greeting_text><span>Text 1</span></div>
</div>
<hr>
<div class=text_container>
<div class=offer_text><span>text 2</span></div>
</div>
<hr>
<div class=text_container>
<div class=redemption_text><span>Text 3</span></div>
</div>
<div class=text_container>
<div><span class="italics_span">Text 4</span></div>
</div>
</div>
<div class=logo_container>
<div style="margin:0 20px 0 20px;text-align:center"><img
src="data:image/svg+xml,%3Csvg%20xmlns=%27http%3A//www.w3.org/2000/svg%27%20width=%27128%27%20height=%2730%27%3E%3Crect%20fill-opacity=%270%27/%3E%3C/svg%3E"
alt=""
style="background-blend-mode:normal!important; background-clip:content-box!important; background-position:50% 50%!important; background-color:rgba(0,0,0,0)!important; background-image:var(--sf-img-5)!important; background-size:100% 100%!important; background-origin:content-box!important; background-repeat:no-repeat!important">
</div>
</div>
</div>
Is there a way I can still use my custom fonts and get PDFKit to print it like normal text?
Upvotes: 0
Views: 226