Ted Romanus
Ted Romanus

Reputation: 582

Unknown PDF encoding

I have PDF document with Ukrainian text (Cyrillic letters). But when I copy and paste it into some input field, I get something next:

ȿɄɈɇɈɆȱɄɈ-ɋɌȺɌɂɋɌɂɑɇɂɃ ȺɇȺɅȱɁ ȼɂȻȱɊɄɈȼɈȽɈ

No one text detection or converter didn't help me.

What is it and how to copy normal Ukrainian text?

Upvotes: 0

Views: 133

Answers (1)

lecstor
lecstor

Reputation: 5707

The PDF has likely been created with an embedded font subset and no toUnicode mapping. Basically the codes of the characters used in the content of the PDF are mapped to glyphs embedded in the PDF which are displayed, but there is no mapping from these codes to regular Unicode codes so copying them produces gibberish. The only way to extract the original contents would be with some form of OCR.

Upvotes: 1

Related Questions