Reputation: 2917
I would like to find out, if a pdf file is encoded in UTF-8. How to check, which caracter encoding is used in a pdf file?
Upvotes: 11
Views: 26704
Reputation: 36
PDF (before version 2.0) -files can be either a 8-bit binary file or a 7-bit ASCII(-85) text file. (www.Prepressure.com/pdf/basics/fileformat is a good article describing the PDF file format in more detail).
In 2017 the "PDF 2.0" standard (ISO 32000-2) was released, which, amongst other changes, offered utf-8 encoding as an additional text string format. The PDF Association has more information: www.pdfa.org/understanding-utf-8-in-pdf-2-0
In short: it depends (on which PDF standard the question is referring to- in regards to the "text string" of the file).
Upvotes: 2
Reputation: 95898
A PDF is a binary file, not a text file.
A character encoding like "UTF-8" makes only sense in context with text files (*.txt, *.html, *.xml, *.csv, ...).
Thus, a PDF never is UTF-8 encoded.
Upvotes: 22