hitesh israni
hitesh israni

Reputation: 1752

pdflib certain unicode characters not rendering

I need to compose a pdf using pdflib version 8, In which I need to print certain unicode characters

๐‘Ž ๐‘ฅ

But they are not getting rendered ,Instead below characters are displaying

โ‚ฌ

What could be the reason & How can I render the characters?

Below is the code

$p = PDF_new();

/*  open new PDF file; insert a file name to create the PDF on disk */
if (PDF_begin_document($p, "", "") == 0) {
    die("Error: " . PDF_get_errmsg($p));
}
PDF_set_info($p, "Creator", "Abc");
PDF_set_info($p, "Author", "Abc");
PDF_set_info($p, "Title", "Test");
pdf_set_option($p, "textformat=utf8");

PDF_begin_page_ext($p, 595, 842, "");
$fontdir = '/usr/share/fonts/truetype/dejavu';
pdf_set_parameter($p, "FontOutline", "Dejavu=$fontdir/DejaVuSans.ttf");
$font = pdf_load_font($p, "Dejavu", "unicode","");

PDF_setfont($p, $font, 24.0);
PDF_set_text_pos($p, 50, 700);
pdf_show_xy($p,"dejb โ‚ฌ",100,490);
pdf_show_xy($p,"dejb ๐‘ฅ ๐‘Ž",200,490);
PDF_end_page_ext($p, "");

PDF_end_document($p, "");

$buf = PDF_get_buffer($p);
$len = strlen($buf);

header("Content-type: application/pdf");
header("Content-Length: $len");
header("Content-Disposition: inline; filename=hello.pdf");
print $buf;

PDF_delete($p);

Output

enter image description here

Edit:

Tried using freesans font instead of dejavu, but no change in the output.

$fontdir = '/usr/share/fonts/truetype/freefont';
pdf_set_parameter($p, "FontOutline", "FreeSans=$fontdir/FreeSans.ttf");
$font = pdf_load_font($p, "FreeSans", "unicode","")

Upvotes: 0

Views: 3484

Answers (2)

Nor.Z
Nor.Z

Reputation: 1419

For problems related to:
text like "ff" being rendered into weird characters like "โฐ".

--

What I did is:
use embedFont after loading the pdf.

    const doc_Pdflib = await PDFDocument.load(bytesPdf);
    const font = await doc_Pdflib.embedFont(StandardFonts.TimesRoman, { 
      subset: true, 
      features: { liga: false }, 
    }); 
    const list_page_pdflib = doc_Pdflib.getPages();

--

You may check out the following links. They say there is some problem with the font in the pdf.

characters "ff" are not captured from return value of getTextContent() ยท Issue #11016 ยท mozilla/pdf.js https://github.com/mozilla/pdf.js/issues/11016

Font issue with specific character combination ยท Issue #871 ยท Hopding/pdf-lib https://github.com/Hopding/pdf-lib/issues/871

--

Actually, it wasnt the code made it worked.

The actual solution was to:
use a PDF editor to open the pdf file
and edit it by trivially add a space or whatever
and save it.
-- the PDF editor will fix the internally corrupted pdf file somehow.
Then when you run your code, the problem disappeared.

Upvotes: -1

Rainer
Rainer

Reputation: 2185

You can solve your problem by using a font that contains the required glyphs. When you check the page of your linked page "MATHEMATICAL ITALIC SMALL A" you can see a link to the "Fonts that support U+1D44E":

As you can see, just a few fonts support this glyph, for example "DejaVu Serif Italic". When I use DejaVu Serif Italic (DejaVuSerif-Italic.ttf) from the DejaVu package I get the expected output:

enter image description here

Of course also other fonts might support this glyphs and you are not limited to DejaVuSans Serif.

Just one note to your code: The line:

pdf_set_option($p, "textformat=utf8");

requires PDFlib 9. Please use

PDF_set_parameter($p, "textformat", "utf8");

instead.

Upvotes: 6

Related Questions