Reputation: 31
I'm using iText 7 (version 7.1.7) in my Dot Net 4 / C# project and processing PDF documents that have user passwords.
The passwords are suppled and everything is working perfectly fine, except for when a non-ASCII character (like a £ sign) is used in the password.
Does anyone know of a way to get iText 7 to understand a password like "hello£1234"?
I've tried extract the password bytes by treating my string as UTF8 or Unicode, but nothing seems to work for me.
At the point where I try to load the PdfDocument, I just get a "Bad user password" exception
Here is my code:
string password = "hello£1234";
byte[] passwordBytes = new System.Text.ASCIIEncoding().GetBytes(password);
PdfReader reader = new PdfReader(tempInFile, new ReaderProperties().SetPassword(passwordBytes));
PdfDocument pdfDoc = new PdfDocument(reader);
// Do my stuff with the document here
pdfDoc.Close();
Upvotes: 0
Views: 5125
Reputation: 95918
Actually it depends on the revision of the security handler used to encrypt the PDF you try to open.
ISO 32000-2 specifies:
All passwords for revision 6 shall be based on Unicode. Preprocessing of a user-provided password consists first of normalizing its representation by applying the "SASLPrep" profile (Internet RFC 4013) of the "stringprep" algorithm (Internet RFC 3454) to the supplied password using the Normalize and BiDi options. Next, the password string shall be converted to UTF-8 encoding, and then truncated to the first 127 bytes if the string is longer than 127 bytes (see 7.6.4.3.2, "Algorithm 2.A: Retrieving the encryption key from an encrypted document in order to decrypt it (revision 6 and later)", steps (a, b)).
For other revisions this is not specified and depends on the implementation of the security handler.
Thus, for revision 6 you correctly apply UTF-8 encoding but miss that normalization preparation. In simple cases that normalization does not change the password, so your code often will succeed.
For other revisions your approach is as good as any ;)
Upvotes: 0
Reputation: 31
Did you try to use Encoding.GetEncoding(1252)?
I find that normally covers most characters
Upvotes: 0
Reputation: 31
I thought I'd found the answer in using my system's default codepage, but it didn't turn out to be 100% effective
Plain ASCII can't represent characters like £, but extended ASCII (or code page 437) can. UTF8 also can, but different types of encoding seem to work in different circumstances.
My solution, for now, is just to try a few. It's a bit of a battering ram approach, so if someone has a more elegant solution then I'd be interested to see it.
Here is my code now:
Encoding cp437 = Encoding.GetEncoding(437);
List<byte[]> passwordByteList = new List<byte[]>()
{
Encoding.Default.GetBytes(password), //Default codepage
Encoding.UTF8.GetBytes(password), //UTF8 encoding
cp437.GetBytes(password), //Code page 437 (extended ASCII) encoding
};
foreach(byte[] passwordBytes in passwordByteList)
{
PdfReader reader = new PdfReader(tempInFile, new ReaderProperties().SetPassword(passwordBytes));
try
{
//Try to open the PDF with the password
PdfDocument pdfDoc = new PdfDocument(reader);
//Do something with the document
pdfDoc.Close();
reader.Close();
}
catch (Exception ex)
{
System.Diagnostics.Debug.WriteLine(ex.ToString());
//Exception thrown by PDF reader. We need to try the next password.
reader.Close();
}
}
Upvotes: 3