Abolfazl Mohajeri
Abolfazl Mohajeri

Reputation: 2007

pdftron copy wrong text

I want to use pdftron and all things work perfect but when i copy text from pdf some characters convert to blank square and question mark, any idea?

here is my pdf.

As you can see below: enter image description here

I wrote this code:

WebViewer({
    path: '/assets/plugins/pdftron',
    initialDoc: '/practical.pdf',
    fullAPI: true,
    disableLogs: true
}, document.getElementById('pdf')).then((instance) => {
    // PDFNet is only available with full API enabled
    const { PDFNet, docViewer } = instance;

    let Feature = instance.Feature;
    instance.disableFeatures([Feature.NotesPanel]);

    docViewer.on('documentLoaded', () => {
        // call methods relating to the loaded document
    });


    instance.textPopup.add({
        type: 'actionButton',
        img: '/language.svg',
        onClick: () => {
            const quads = docViewer.getSelectedTextQuads(docViewer.getCurrentPage());
            const text = docViewer.getSelectedText();
            $("#out-pdf").html(text);
            console.log(quads);
        },
    });
});

Upvotes: 2

Views: 392

Answers (2)

Jussi Nieminen
Jussi Nieminen

Reputation: 151

Document does seem to cause incorrect extraction. Extraction is not defined by PDF specification so every viewer handles cases little differently. I your case there is a probably a malformed or incomplete font or unicode map included in the document. We've added multiple fixes to our core components and with those fixes extraction happens correctly. Unfortunately current release of WebViewer does not include these fixes yet. We cannot give exact time schedule when fixes will be land to the WebViewer, but should be at least part of our next major release. For now I would try to see if you can recreate the document and see if that helps. Most of the documents we see and test have no problem with extraction.

Upvotes: 2

Jussi Nieminen
Jussi Nieminen

Reputation: 151

Could you create ticket through our support https://www.pdftron.com/form/request/ and attach the document that this happens to the ticket, so I can take a closer look on and get issue resolved faster.

Upvotes: 1

Related Questions