Reputation: 5902
We have an application that generates pdf files, some times for some unknown reason, one of the pdf files gets corrupted, that is it is created corrupted, we need to check if this pdf is corrupted or not before continuing to other pdfs, if it is corrupted we need to create it again.
Thanks
Upvotes: 1
Views: 11815
Reputation: 427
You can check Header PDF like this:
public bool IsPDFHeader(string fileName)
{
byte[] buffer = null;
FileStream fs = new FileStream(fileName, FileMode.Open, FileAccess.Read);
BinaryReader br = new BinaryReader(fs);
long numBytes = new FileInfo(fileName).Length;
//buffer = br.ReadBytes((int)numBytes);
buffer = br.ReadBytes(5);
var enc = new ASCIIEncoding();
var header = enc.GetString(buffer);
//%PDF−1.0
// If you are loading it into a long, this is (0x04034b50).
if (buffer[0] == 0x25 && buffer[1] == 0x50
&& buffer[2] == 0x44 && buffer[3] == 0x46)
{
return header.StartsWith("%PDF-");
}
return false;
}
Upvotes: -1
Reputation: 89172
Look at PDF Parsers and try to use them to detect the corruption. For example, ghostscript.
Disclaimer: I work for Atalasoft
In DotImage Document Imaging, we include some PDF Parsing classes that will throw if the file is corrupt.
If you add our PDF Reader add-on, we will try to rasterize the PDF -- if it's corrupt, that will throw. If the problem is missing pieces, then you can look for them in the resulting image.
Upvotes: 2