Bayram
Bayram

Reputation: 221

How can I use pdf document as source of API?

I am trying to build an ios dictionary app in my language using ReactNative, JavaScript.
I have a pdf document (it's an actual text file) that includes most words with their own definitions.

How can I use that file as source of my API?
What would be the most efficient way?

!https://github.com/bayram96/stack-over-flow-images/blob/master/IMG_3525.jpeg

Upvotes: 2

Views: 86

Answers (1)

venimus
venimus

Reputation: 6067

Short answer - you can not do it. At least it is too complex and the effort won't pay back.

PDF is not a text file. It is more like a compressed html+css. I won't get into details of the format.

But basically it optimizes the contents, so what you see if you open it with a hex editor (or notepad) won't always match the visible text. In fact it will be very rare case.

Among with embedding the images and other meta data it also embeds the fonts and usually only the used part of it. Furthermore the text in it is not utf-8 so any non-latin characters will not appear even in notepad. Specifically in dictionaries there are multiple special characters that appear in the text, which do not have equivalent latin letters.

In addition even though a set of characters appear as a text, they might not be in the right way in the file, because the format also have coordinates on the page.

May be you could find a 3rd party pdf parser (software or service) which you could use to extract some textual data from it with more consistent output. Then route your files and convert them with it. But still multiple issues I described will be there too.

Upvotes: 1

Related Questions