Reputation: 125
I would like to extract the form data from a PDF using a library, preferably a free software library that is packaged in ubuntu.
For example, let's say I have an HTML form, but I would also like for it to be possible for the users to submit a filled-out PDF form instead of the HTML form.
So, what I'm looking for is a library (or simple CLI utility) that takes a PDF as input, and allows me to extract the filled-out fields by name, much like with HTML.
I have tried pdftotext, but that doesn't really preserve the information, it just renders the PDF as text. I tried PDFminer, but it didn't seem to work (at least with my test PDF) at all (just got empty output).
If it's a library, I'm not too picky about the language, but python would be a plus.
Upvotes: 2
Views: 2113