Reputation: 16832
I'm using /usr/bin/pdftk filename.pdf dump_data_fields output - flatten
to get the FDF fields in a PDF but it seems to be including invisible FDF fields as well.
https://docdro.id/nriB59b is a one-page PDF without any txt but with a number of these invisible FDF fields. pdftk's output can be seen at https://pastebin.com/ag6vweNP.
How can I exclude invisible FDF fields?
I'm currently using pdftk but I'm open to using other tools as well.
Thanks!
Upvotes: 1
Views: 340
Reputation: 6424
My guess is you have to inspect the PDF yourself to detect if or not a field is invisible. In another side, it may become very tricky to tell if a field is invisible or not, except if a flag sets this.
For example, although I don't know if it's possible, but let say a field is outside the page or covered by another content... Is it visible or not?
By the way, you can use qpdf
to inspect the content of a PDF file. The following command will decompress your pdf to get it human readable.
qpdf --qdf --object-streams=disable orig.pdf uncompressed-qpdf.pdf
If you prefer a JSON representation:
qpdf --json your_pdf.pdf > your_pdf.json
If you go for the later one, you can parse the json output with jq
.
Then, use the PDF speficication you want to apply. I suggest also these steps:
diff
.Upvotes: 1