Ahmad
Ahmad

Reputation: 63

Problems in extracting Arabic text from PDF files using Pypdfium2 tool

I'm using pypdfium2 to extract Arabic text from pdf files. It works well for some pdfs although it needs some post-processing. But for some pdf files, the result is too bad. For example:

الطائر الذي أوقعه جماله في الأسر

is being extracted as:

الذيالطائر جمالهأوقعه في الأسر

I noticed that even when I simply copy this sentence from the original pdf file and past it on a text file, I got the same result. Do you know why I'm having this issue and how to solve it ? I need to use pypdfium2 tool because I'm writing a report about it.

Upvotes: 0

Views: 47

Answers (0)

Related Questions