Reputation: 1593

How to check if a PDF has any kind of digital signature

I need to understand if a PDF has any kind of digital signature. I have to manage huge PDFs, e.g. 500MB each, so I just need to find a way to separate non-signed from signed (so I can send just signed PDFs to a method that manages them). Any procedure found until now involves attempt to extract certificate via e.g. Bouncycastle libs (in my case, for Java): if it is present, pdf is signed, if it not present or a exception is raised, is it not (sic!). But this is obviously time/memory consuming, other than an example of resource-wastings implementation.

Is there any quick language-independent way, e.g. opening PDF file, and reading first bytes and finding an info telling that file is signed? Alternatively, is there any reference manual telling in detail how is made internally a PDF?

Thank you in advance

Upvotes: 17

Answers (4)

Sampisa

Reputation: 1593

After six years, this is the solution I implemented in Java via IText that can find any PADES signature presence on an unprotected PDF file.

This easy method returns a 3-state Boolean (don't wallop me for that, lol): Boolean.TRUE means "signed"; Boolean.FALSE means "not signed"; null means that something nasty happened reading the PDF (and in this case, I send the file to the old slow analysis procedure). After about half a million PADES-signed PDFs were scanned, I didn't have any false negatives, and after about 7 million of unsigned PDFs I didn't have any false positives.

Maybe I was just lucky (my PDF files were just signed once, and always in the same way), but it seems that this method works - at least for me. Thanks @Patrick Gallot

private Boolean isSigned(URL url)
{
    try {
        PdfReader reader = new PdfReader(url);
        PRAcroForm acroForm = reader.getAcroForm();
        if (acroForm == null) {
            return false;
        }
        // The following can lead to false negatives
        // boolean hasSigflags = acroForm.getKeys().contains(PdfName.SIGFLAGS);
        // if (!hasSigflags) {
        //     return false;
        // }
        List<?> fields = acroForm.getFields();
        for (Object k : fields) {
            FieldInformation fi = (FieldInformation) k;
            PdfObject ft = fi.getInfo().get(PdfName.FT);
            if (PdfName.SIG.equals(ft)) {
                logger.info("Found signature named {}", fi.getName());
                return true;
            }
        }
    } catch (Exception e) {
        logger.error("Whazzup?", e);
        return null;
    }
    return false;
}

Another function that should work correctly (I found it checking recently a paper written by Bruno Lowagie, Digital Signatures for PDF documents, page 124) is the following one:

private Boolean isSignedShorter(URL URL) 
{
    try {
        PdfReader reader = new PdfReader(url);
        AcroFields fields = reader.getAcroFields();
        return !fields.getSignatureNames().isEmpty();
    } catch (Exception e) {
        logger.warn("Whazzup?", e);
        return null;
    }
}

I personally tested it on about a thousand signed/unsigned PDFs and it seems to work too, probably better than mine in case of complex signatures.

I hope to have given a good starting point to solve my original issue :)

Upvotes: 0

dgvirtual

Reputation: 101

Using command line you can check if a file has a digital signature with pdfsig tool from poppler-utils package (works on Ubuntu 20.04).

pdfsig pdffile.pdf

will produce output with detailed data on the signatures included and validation data. If you need to scan a pdf file tree and get a list of signed pdfs you can use a bash command like:

find ./path/to/files -iname '*.pdf'  \
-exec bash -c 'pdfsig "$0";  \
if [[ $? -eq 0 ]]; then  \
echo "$0" >> signed-files.txt; fi' {} \;

You will get a list of signed files in signed-files.txt file in the local directory.

I have found this to be much more reliable than trying to grep some text out of a pdf file (for example, the pdfs produced by signing services in Lithuania do not contain the string "SigFlags" which was mentioned in the previous answers).

Upvotes: 10

yucer

Reputation: 5049

This is not the optimal solution, but it is another one... you can to check "Sigflags" and stop at the first match:

grep -m1 "/Sigflags" ${PDF_FILE}

or get such files inside a directory:

grep -r --include=*.pdf -m1 -l "/Sigflags" . > signed_pdfs.txt

grep -r --include=*.pdf -m1 -L "/Sigflags" . > non_signed_pdfs.txt

Grep can be very fast for big files. You can run that in a batch for certain time and process the resulting lists (.txt files) after that.

Note that the file could be modified incrementally after a signature, and the last version might not be signed. That would be the actual meaning of "signed".

Anyway, if the file doesn't have a /Sigflags string , it is almost sure that it was never signed.

Note the conforming readers start reading backwards (from the end of the file) because there is the cross-reference table that says where is every object.

I advice you to use peepdf to check the inner structure of the file. It supports executing it commands over the file. For example:

 $ peepdf -C "search /SigFlags" signed.pdf 

   [6]

  $ peepdf -C "search /SigFlags" non-signed.pdf 

    Not found!!

But I have not tested the performance of that. You can use it to browse over the internal structure of the PDF an learn from the PDF v1.7 Reference. Check for the Annexs with PDF examples there.

Upvotes: 5

Patrick Gallot

Reputation: 625

You are going to want to use a PDF Library rather than trying to implement this all yourself, otherwise you will get bogged down with handling the variations of Linearized documents, Filters, Incremental updates, object streams, cross-reference streams, and more.

With regards to reference material; per my cursory search, it looks like Adobe is no longer providing its version of the ISO 32000:2008 specification to any and all, though that specification is mainly a translation of the PDF v1.7 Reference manual to ISO-conforming language.

So assuming the PDF v1.7 Reference, the most relevant sections are going to be 8.7 (Digital Signatures), 3.6.1 (Document Catalog), and 8.6 (Interactive Forms).

The basic process is going to be:

Read the Document Catalog for 'Perms' and 'AcroForm' entries.
Read the 'Perms' dictionary for 'DocMDP','UR', or 'UR3' entries. If these entries exist, In all likelyhood, you have either a certified document or a Reader-enabled document.
Read the 'AcroForm' entry; (make sure that you do not have an 'XFA' entry, because in the words of Fraizer from Porgy and Bess: Dat's a complication!). You basically want to first check if there is an (optional) 'SigFlags' entry, in which case a non-zero value would indicate that there is a signature in the Fields Array. Otherwise, you need to walk each entry of the 'Fields' Array looking for a field dictionary with an 'FT' (Field Type) entry set to 'Sig' (signature), with a 'V' (Value) entry that is not null.

Using a PDF library that can use the document's cross-reference table to navigate you to the right indirect objects should be faster and less resource-intensive than a brute-force search of the document for a certificate.

Upvotes: 11

How to check if a PDF has any kind of digital signature

Answers (4)

Related Questions