Reputation: 63
I'm using pypdfium2 to extract Arabic text from pdf files. It works well for some pdfs although it needs some post-processing. But for some pdf files, the result is too bad. For example:
الطائر الذي أوقعه جماله في الأسر
is being extracted as:
الذيالطائر جمالهأوقعه في الأسر
I noticed that even when I simply copy this sentence from the original pdf file and past it on a text file, I got the same result. Do you know why I'm having this issue and how to solve it ? I need to use pypdfium2 tool because I'm writing a report about it.
Upvotes: 0
Views: 47