Reputation: 1893

how to know for corrupted PDF file before merging using iTextSharp in C#

I am using iTextSharp to merge pdf pages.

But they might be some corrupted pdf.

My question is, how to verify programmatically whether the pdf is corrupted or not?

Upvotes: 0

Answers (2)

goTo-devNull

Reputation: 9372

One way, since you're merging files, is to wrap your code in a try...catch block:

Dictionary<string, Exception> errors = 
  new Dictionary<string, Exception>();
document.Open();
PdfContentByte cb = writer.DirectContent;
foreach (string filePath in testList) {
  try {
    PdfReader reader = new PdfReader(filePath);
    int pages = reader.NumberOfPages;
    for (int i = 0; i < pages; ) {
      document.NewPage();
      PdfImportedPage page = writer.GetImportedPage(reader, ++i);
      cb.AddTemplate(page, 0, 0);
    }
  }
// **may** be PDF spec, but not supported by iText      
  catch (iTextSharp.text.exceptions.UnsupportedPdfException ue) {
    errors.Add(filePath, ue);
  }
// invalid according to PDF spec
  catch (iTextSharp.text.exceptions.InvalidPdfException ie) {
    errors.Add(filePath, ie);
  }
  catch (Exception e) {
    errors.Add(filePath, e);
  }
}
if (errors.Keys.Count > 0) {
  document.NewPage();
  foreach (string key in errors.Keys) {
    document.Add(new Paragraph(string.Format(
      "FILE: {0}\nEXCEPTION: [{1}]: {2}",
      key, errors[key].GetType(), errors[key].Message
    )));
  }
}

where testList is a collection of file paths to the PDF documents you're merging.

On a separate note, you also need to consider what you define as corrupt. There are many PDF documents out there that do not meet PDF specs, but some readers (Adobe Reader) are smart enough to fix/repair them on the fly.

Upvotes: 0

Wim Haanstra

Reputation: 5998

I usually check the header of a file to see what kind of file it is. A PDF header always starts with %PDF.

Ofcourse the file could be corrupted AFTER the header, then I am not really sure if there is any other way than just trying to open and read from the document. When the file is corrupted, opening OR reading from that document probably gives an exception. I am not sure iTextSharp throws all kinds of exceptions, but I think you can test that out.

Upvotes: 1

how to know for corrupted PDF file before merging using iTextSharp in C#

Answers (2)

Related Questions