D. Bermudez
D. Bermudez

Reputation: 217

Read PDF Header using C# or VB.Net

I am trying to open a PDF file from my VB.Net application. I get an error (pop-up) saying "File Does Not Begin with '%PDF-'. I would like to read the header of the file to determine if the file is corrupted or not. Right now I am using the Windows.Forms.WebBrowser control to display PDF files that I load from a database. Most of the files are loaded fine, but some are corrupt, hence the pop-up.

This is the line I use to load the file: webBrw.Navigate(Me.currentDocPath)

How can I do this in VB.Net 2010?

Upvotes: 0

Views: 4141

Answers (3)

Gary Walker
Gary Walker

Reputation: 9134

From the PDF spec.

The first line of a PDF file shall be a header consisting of the 5 characters %PDF– followed by a version number of the form 1.N, where N is a digit between 0 and 7.

It sounds like your file is not actually a valid pdf file. It would be the first thing I double check. I used to get XML files from a vendor that were not actually valid XML files, so the XML parser threw an exception -- It surprised me that the vendor refused to fix this problem since abort is what XML parsers are supposed to do if the file is not valid. My eventual solution was to write a preparser that corrected the invalid XML and then invoke the standard parser.

I would recommend trying a PDF verification tool, http://www.pdf-tools.com/pdf/validate-pdfa-online.aspx is one example try How can I test a PDF document if it is PDF/A compliant? for more, Adobe preflight (bundled with the professional version) verifies lots of stuff, not just is it technically a PDF file.

Upvotes: 1

D. Bermudez
D. Bermudez

Reputation: 217

I have found that if you read the file with a stream reader, and you read the first line you can check whether it contains the %PDF header tag as below:

 Dim stream As New StreamReader("C:\Users\dbermudez\Desktop\docBOLR_0.pdf")
 Dim containsPDFHeader As Boolean = True

 If Not stream.ReadLine().Contains("%PDF") Then
     containsPDFHeader = False
 End If

Upvotes: 0

Nicholas Post
Nicholas Post

Reputation: 1857

Are you able to open the 'errored' files if you access them directly? I had an error like this before and it was an issue with the client-side Adobe reader. Certain versions of the reader didn't like files created by certain versions of the writer. We were able to solve this by upgrading the client's reader to the newest version which resolved our issues.

I also had a project where I needed to update text in a PDF file. I found out that .Net cannot directly perform this, so I had to rely on a separate library. In order to test a file, you could use a library top open the file in a try/catch block. If it fails to load, you know the file may be corrupt.

Hope this helps.

Upvotes: 0

Related Questions