camslaz
camslaz

Reputation: 239

Get dimensions and coordinates of textfields in PDF

Is it possible to get the X/Y coordinates and height/width of all textfields in a PDF document using PHP or linux library? I am using PDFTK to extract all textfields in the PDF but it doesn't give me coordinate and/or dimension information. If not, is it possible to traverse the PDF doc and calculate the x,y and height/width data for the text fields?

Upvotes: 6

Views: 3183

Answers (2)

simon
simon

Reputation: 16340

yeah, it's not too hard. the best tool i know for the job is pdfminer. it's python, but if you don't want to use python, you can just dump the pdf info in xml format, and parse that with your weapon of choice :) reply if you have trouble :)

Upvotes: 0

mario
mario

Reputation: 145512

It's possible, but hardly doable.

You can open PDF documents in PHP using FPDI. It generates an abstract tree of PDF objects in memory. TCPDF and FPDF can save it back.

However traversing said tree and finding the correct attributes is very. (I accidently the verb.)

Now the PDF format is actually human-readable. And it would certainly contain the coordinates in a readable format (it's mostly in points IIRC). So you might possibly discover it with a simple regex, if you only knew where to look. Some nodes just need to be gzuncompress()ed, and you are not attempting to modify the document or save it back anyway. So, try FPDI and print_r() to devise a strategy.

Upvotes: -1

Related Questions