Reputation: 89
hi I'm working on an app that parses out pdf data for viewing on mobile devices, I'm looking for a way to scan through a pdf file for specific text and getting the x & y coordinates of that text block. Is that even possible. I working on a Linux server, with php but I'm flexible to use whatever means to get this working. Thanks.
Upvotes: 1
Views: 570
Reputation: 76
Commercial options:
All are pretty mature, TET is very specific to text extraction, pdfToolbox is a general purpose SDK for analyzing and manipulating PDFs (but has a specific feature to do text extraction, with coordinates of text on the page), and Adobe PDF Library is rather a general purpose development tool (offers a lot of low level features, but code would have to be written that does find text/words/characters and pulls out the coordinates).
Disclaimer: I work for callas software, my view on pdfToolbox may be biased.
Upvotes: 3