divyanshu mishra
divyanshu mishra

Reputation: 9

UTF-8 characters are showing as boxes when converting HTML to PDF

I wanted to convert HTML to PDF having special characters but the output is not showing the special characters.

from io import BytesIO
from django.http import HttpResponse
from django.template.loader import get_template
from xhtml2pdf import pisa

def html2pdf(template_source,context_dict={}):
    template=get_template(template_source)
    html=template.render(context_dict)
    result=BytesIO()
    pdf=pisa.CreatePDF(BytesIO(html.encode('utf-8')),result)
if not pdf.err:
    return HttpResponse(result.getvalue(),content_type="application/pdf")
return None

is my pdf.py and I have a HTML file which is pdf.html

<!DOCTYPE html>
<html lang="en">
<meta charset="UTF-8">
<head>
    <style>
        body {font-family: 'Josefin Slab';
        font-size: large;
        background-color: beige;}
        </style>
    <meta charset="UTF-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Document</title>
</head>
<body>
    <h2 class="utf">This is myŐ, Ű, ő or ű✅✅ pdf file with special char</h2>
</body>
</html>

When I am converting this into a PDF it is showing

This is my■, ■, ■ or ■■■■■■■■■■■■■ pdf file with special......

What to do now?

Upvotes: -1

Views: 927

Answers (1)

K J
K J

Reputation: 11738

As noted in comments your using characters that do not exist in the font so use a different font ! However also see notes below

enter image description here

Here we can see that a PDF of the characters when correctly embedded will still work in the browser pdf view but are not handled well in a conventional pdf viewer.

enter image description here

Not all characters are available even in a full universal font, specifically coloured html objects like emoji or your ✅ since those are generated by browser fonts thus need conversion to image with underlying text. That combination of two for one is problematic for use in a PDF. It depends on the PDF writer if it will be possible with a given font so safer fudge is use the square root symbol. enter image description here

Side Note in some Scandinavian countries a tick can mean wrong not right https://en.wikipedia.org/wiki/Check_mark

Upvotes: 1

Related Questions