Bonnard
Bonnard

Reputation: 389

convert/normalize special characters when using jspdf

Trying to use the jspdf lib @1.4.1 to convert text to pdf, the output sometimes gets so ugly and unreadable, because the text contains some special characters, like:

the left single quotation mark U+2018, or the right one U+2019, or symbols like , or the ı in Kadıköy... how can i sanitize/normalize such texts? or is there any option is jspdf that i can use to fix this problem?

update:

to reproduce the problem, just use this string: '→Kadıköy' in this example https://parall.ax/products/jspdf , line 9, you will see that the arrow is converted to !’ and the ı is converted to 1

(FYI, Kadıköy is name of a city https://en.wikipedia.org/wiki/Kad%C4%B1k%C3%B6y)

Upvotes: 5

Views: 15969

Answers (3)

Bharata
Bharata

Reputation: 14175

We can read here:

jsPDF supports finally UTF-8 by having the ability to use custom fonts.

The problem which you have is that you do not really realize how the PDF works. It must have some font which can display correct letters. It must be one system font (for PDF reader) or embeded font. And for each one single letter the PDF have to have one correct font. In this case for each word in new language in the same PDF you have to set the correct font.

Some TTF fonts was created for some specific letters, but not all TTFs was correctly created because behind this is one standard technology. Also not all of TTF fonts which was created for some specific letters can display them in PDF. For example font "Devanagari" which I have found in internet schould support all hindi letters, but it has failed fully.

Also we have to find the correct TTF fonts. And I found them - in your case for the string "‘→Kadıköy’" you could use "Courier New" or "Arial Unicode MS".

I have searched for each letter from your task and have found following lists:

→ – Font support for "Rightwards arrow" (u+2192)

ı – Font support for "Latin small letter dotless I" (u+0131)

‘ – Font support for "Left single quotation mark" (u+2018)

’ – Font support for "Right single quotation mark" (u+2019)

ö – Font support for "Latin small letter o with diaeresis'" (u+00F6)

Solution for most languages of the world

I have created the application which can create PDFs for most of languages in the world.

How to use it:

  1. At first download and extract free TTF font "Arial Unicode MS"
  2. Start the snippet below and choose the extracted free TTF font "Arial Unicode MS" from your folder.
  3. Write the text in your language and click on "Create PDF" button.
  4. The PDF will be downloaded and you could open it.

In some cases it could be that your language is not supported in TTF font "Arial Unicode MS". The full list of supported languages you can find here. In this case you have to find one from the correct TTF font. But be careful: if the font is under 100 kb. I have the expirience that does not work with jsPDF (see the beginning of my post).

The application

var fontInBase64 = '',
    fileName = '',
    message = document.querySelector('div'),
    txtForPdf = document.querySelector('textarea'),
    errorStr = '<b style="color:red">Please select a font file!</b>';

function readFile()
{
    var file = document.querySelector('input[type=file]').files[0],
        reader = new FileReader();

    if(file && file.name.split('.')[1].toLowerCase() != 'ttf')
    {
        message.innerHTML = errorStr;
        return;
    }

    if(txtForPdf.value.replace(/\s+/g, '').length < 1)
    {
        message.innerHTML = '<b style="color:red">Please write some Text!</b>';;
        return;
    }

    reader.onloadend = function()
    {
        fontInBase64 = reader.result.split(',')[1];
        fileName = file.name.replace(/\s+/g, '-');

        createPDF(fileName, fontInBase64);
    }

    if(file) reader.readAsDataURL(file);
    else message.innerHTML = errorStr;
}


function createPDF(fileName, fontInBase64)
{
    var doc = new jsPDF('p','mm','a4');
        fileNameWithoutExtension = fileName.split('.')[0],
        lMargin = 15, // left margin in mm
        rMargin = 15, // right margin in mm
        pdfInMM = 210; // width of A4 in mm

    doc.addFileToVFS(fileName, fontInBase64);
    doc.addFont(fileName, fileNameWithoutExtension, 'normal');

    doc.setFont(fileNameWithoutExtension);
    doc.setFontSize(14);
    var splitParts = doc.splitTextToSize(txtForPdf.value, (pdfInMM - lMargin - rMargin));
    doc.text(15, 15, splitParts);

    doc.save('test.pdf');
}

function setHindiToTextArea()
{
    txtForPdf.value =
    "हिन्दी विश्व की एक प्रमुख भाषा है एवं भारत की राजभाषा है। केंद्रीय स्तर पर भारत में दूसरी आधिकारिक भाषा अंग्रेजी है। यह हिन्दुस्तानी भाषा की एक मानकीकृत रूप है जिसमें संस्कृत के तत्सम तथा तद्भव शब्द का प्रयोग अधिक हैं और अरबी-फ़ारसी शब्द कम हैं। हिन्दी संवैधानिक रूप से भारत की प्रथम राजभाषा और भारत की सबसे अधिक बोली और समझी जाने वाली भाषा है। हालांकि, हिन्दी भारत की राष्ट्रभाषा नहीं है क्योंकि भारत का संविधान में कोई भी भाषा को ऐसा दर्जा नहीं दिया गया था। चीनी के बाद यह विश्व में सबसे अधिक बोली जाने वाली भाषा भी है। विश्व आर्थिक मंच की गणना के अनुसार यह विश्व की दस शक्तिशाली भाषाओं में से एक है। हिन्दी और इसकी बोलियाँ सम्पूर्ण भारत के विविध राज्यों में बोली जाती हैं। भारत और अन्य देशों में भी लोग हिन्दी बोलते, पढ़ते और लिखते हैं। फ़िजी, मॉरिशस, गयाना, सूरीनाम की और नेपाल की जनता भी हिन्दी बोलती है। 2001 की भारतीय जनगणना में भारत में ४२ करोड़ २० लाख लोगों ने हिन्दी को अपनी मूल भाषा बताया। भारत के बाहर, हिन्दी बोलने वाले संयुक्त राज्य अमेरिका में 648,983; मॉरीशस में ६,८५,१७०; दक्षिण अफ्रीका में ८,९०,२९२; यमन में २,३२,७६०; युगांडा में १,४७,०००; सिंगापुर में ५,०००; नेपाल में ८ लाख; जर्मनी में ३०,००० हैं। न्यूजीलैंड में हिन्दी चौथी सर्वाधिक बोली जाने वाली भाषा है";
}
<script src="https://cdnjs.cloudflare.com/ajax/libs/jspdf/1.4.1/jspdf.min.js" crossorigin="anonymous"></script>
<input type="file" onchange="message.innerHTML='&nbsp;'"><br><br>
<textarea rows="4" cols="75">‘→Kadıköy’</textarea>
<div>&nbsp;</div>
<input type="button" value="Create PDF with UTF support" onclick="readFile()">
<br>
<i>For example</i>:<br><a href="#" onclick="setHindiToTextArea()"><b>Click on this line if you wont to set hindi text to the textarea.</b></a>

Upvotes: 6

Igor
Igor

Reputation: 263

imho, mico answer OK, only replace the font PTSans with the one you use (base64 encode). See jsfiddle: https://jsfiddle.net/o0m9pzyv/12/

var PTSans = ...

Upvotes: 3

mico
mico

Reputation: 12748

You can make it with importing a font that supports your special characters.

From basic.js on examples you see reference how to apply it.

(Example brings cyrillic letters).

function demoUsingTTFFont() {
    //https://fonts.google.com/specimen/PT+Sans
    var PTSans = “...... “); // place long string of text here
    var doc = new jsPDF();

    doc.addFileToVFS("PTSans.ttf", PTSans);
    doc.addFont('PTSans.ttf', 'PTSans', 'normal');

    doc.setFont('PTSans'); // set font
    doc.setFontSize(10);
    doc.text("А ну чики брики и в дамки!", 10, 10);

    doc.save('test.pdf');
}

As a fontfamily, please have a look to Google's Noto.

Source:

https://github.com/MrRio/jsPDF/issues/12 (scroll to down)

Upvotes: 4

Related Questions