Reputation: 1893
I am using iTextSharp to merge pdf pages.
But they might be some corrupted pdf.
My question is, how to verify programmatically whether the pdf is corrupted or not?
Upvotes: 0
Views: 2664
Reputation: 9372
One way, since you're merging files, is to wrap your code in a try...catch
block:
Dictionary<string, Exception> errors =
new Dictionary<string, Exception>();
document.Open();
PdfContentByte cb = writer.DirectContent;
foreach (string filePath in testList) {
try {
PdfReader reader = new PdfReader(filePath);
int pages = reader.NumberOfPages;
for (int i = 0; i < pages; ) {
document.NewPage();
PdfImportedPage page = writer.GetImportedPage(reader, ++i);
cb.AddTemplate(page, 0, 0);
}
}
// **may** be PDF spec, but not supported by iText
catch (iTextSharp.text.exceptions.UnsupportedPdfException ue) {
errors.Add(filePath, ue);
}
// invalid according to PDF spec
catch (iTextSharp.text.exceptions.InvalidPdfException ie) {
errors.Add(filePath, ie);
}
catch (Exception e) {
errors.Add(filePath, e);
}
}
if (errors.Keys.Count > 0) {
document.NewPage();
foreach (string key in errors.Keys) {
document.Add(new Paragraph(string.Format(
"FILE: {0}\nEXCEPTION: [{1}]: {2}",
key, errors[key].GetType(), errors[key].Message
)));
}
}
where testList
is a collection of file paths to the PDF documents you're merging.
On a separate note, you also need to consider what you define as corrupt. There are many PDF documents out there that do not meet PDF specs, but some readers (Adobe Reader) are smart enough to fix/repair them on the fly.
Upvotes: 0
Reputation: 5998
I usually check the header of a file to see what kind of file it is. A PDF header always starts with %PDF
.
Ofcourse the file could be corrupted AFTER the header, then I am not really sure if there is any other way than just trying to open and read from the document. When the file is corrupted, opening OR reading from that document probably gives an exception. I am not sure iTextSharp throws all kinds of exceptions, but I think you can test that out.
Upvotes: 1