Hassan Mokdad
Hassan Mokdad

Reputation: 5902

check if a pdf file is corrupted using C#

We have an application that generates pdf files, some times for some unknown reason, one of the pdf files gets corrupted, that is it is created corrupted, we need to check if this pdf is corrupted or not before continuing to other pdfs, if it is corrupted we need to create it again.

Thanks

Upvotes: 1

Views: 11815

Answers (2)

rANth
rANth

Reputation: 427

You can check Header PDF like this:

public bool IsPDFHeader(string fileName)    
{

    byte[] buffer = null;
    FileStream fs = new FileStream(fileName, FileMode.Open, FileAccess.Read);
    BinaryReader br = new BinaryReader(fs);

    long numBytes = new FileInfo(fileName).Length;
    //buffer = br.ReadBytes((int)numBytes);
    buffer = br.ReadBytes(5);

    var enc = new ASCIIEncoding();
    var header = enc.GetString(buffer);

    //%PDF−1.0
    // If you are loading it into a long, this is (0x04034b50).
    if (buffer[0] == 0x25 && buffer[1] == 0x50
        && buffer[2] == 0x44 && buffer[3] == 0x46)
    {
        return header.StartsWith("%PDF-");
    }
    return false;
}

Upvotes: -1

Lou Franco
Lou Franco

Reputation: 89172

Look at PDF Parsers and try to use them to detect the corruption. For example, ghostscript.

Disclaimer: I work for Atalasoft

In DotImage Document Imaging, we include some PDF Parsing classes that will throw if the file is corrupt.

If you add our PDF Reader add-on, we will try to rasterize the PDF -- if it's corrupt, that will throw. If the problem is missing pieces, then you can look for them in the resulting image.

Upvotes: 2

Related Questions