Reputation: 1296
I Have a PDF file, I can open using Acrobat reader, outlook or even on my browser,
Somehow, I'm having trouble reading//converting using java libraries ..
The only thing that looks different from other PDF files I'm used to see, is the HTML code, at the start and at the end of the PDF.
So I'm wondering, wether if this structure is standard or not, and what is it used for ?
Upvotes: 1
Views: 2280
Reputation: 95918
So I'm wondering, wether if this structure is standard or not,
It is not standard. According to the PDF standard (ISO 32000-2, similarly also already in ISO 32000-1):
The PDF file begins with the 5 characters “%PDF–”
(ISO 32000-2, section 7.5.2 "File header")
Acrobat Reader opens it nonetheless as it uses relaxed criteria
Acrobat viewers require only that the header appear somewhere within the first 1024 bytes of the file.
(Adobe PDF Reference sixth edition, appendix H.3 "Implementation Notes", item 13)
and a number of other PDF processors, in particular viewers, follow Adobe's example and do so, too.
Nonetheless, this is a deviation from the standard.
and what is it used for ?
Apparently that PDF has originally been received from some web page, and this web page seems to have a bug: It sends a HTML starting segment in spite of the request being for a PDF. The PDF library used here (mPDF) outputs an error message to that effect right after the PDF. Due to the relaxed requirements of Adobe Reader and other PDF viewers, though, this bug seems to have gone unnoticed or at least seems to not have been considered grave enough for fixing.
Somehow, I'm having trouble reading//converting using java libraries
While PDF viewers can afford to be quite lax (because their respective user can quickly tell whether the result looks broken and drop the file), automatic PDF processors need to be more strict (because otherwise broken data may be stored in legally required archives or sent out to thousands and thousands of recipients).
Upvotes: 2