Extracting form data from PDF (library or utlity)

Question

I would like to extract the form data from a PDF using a library, preferably a free software library that is packaged in ubuntu.

For example, let's say I have an HTML form, but I would also like for it to be possible for the users to submit a filled-out PDF form instead of the HTML form.

So, what I'm looking for is a library (or simple CLI utility) that takes a PDF as input, and allows me to extract the filled-out fields by name, much like with HTML.

I have tried pdftotext, but that doesn't really preserve the information, it just renders the PDF as text. I tried PDFminer, but it didn't seem to work (at least with my test PDF) at all (just got empty output).

If it's a library, I'm not too picky about the language, but python would be a plus.

Jiri · Accepted Answer

I am using pdftk to extract some data and manipulate pdf but I am not sure if filled forms can be handled the way you need.

Extracting form data from PDF (library or utlity)

Answers (1)

Related Questions