Reputation: 554
I'm trying to implement the Open XML documentProtection hash protection of a MS Word (2019) document in Python to test the hashing algorithm. So I've created a Word document, protected it against editing with this password: johnjohn. Then, opening the document as ZIP/XML, I see the following in the documentProtection
section:
<w:documentProtection w:edit="readOnly" w:enforcement="1" w:cryptProviderType="rsaAES" w:cryptAlgorithmClass="hash" w:cryptAlgorithmType="typeAny" w:cryptAlgorithmSid="14" w:cryptSpinCount="100000" w:hash="pVjR9ktO9vlxijXcMPlH+4PLwD4Xwy1aqbNQOFmWaSpvBjipNh//T8S3nBhq6HRoRVfWL6s/+NdUCPTxUr0vZw==" w:salt="pH1TDVHSfGBxkd3Q88UNhQ==" />
According to the Open XML docs (ECMA-376-1:2016 #17.15.1.29):
cryptAlgorithmSid="14"
points to the SHA-512 algorithmcryptSpinCount="100000"
means that hashing must be done in 100k rounds, using the following algoright (quote from above standard):Specifies the number of times the hashing function shall be iteratively run (runs using each iteration's result plus a 4 byte value (0-based, little endian) containing the number of the iteration as the input for the next iteration) when attempting to compare a user-supplied password with the value stored in the hashValue attribute.
The BASE64-encoded salt used for hashing ("pH1TDVHSfGBxkd3Q88UNhQ==") is prepended to the original password. The target BASE64-encoded hash must be "pVjR9ktO9vlxijXcMPlH+4PLwD4Xwy1aqbNQOFmWaSpvBjipNh//T8S3nBhq6HRoRVfWL6s/+NdUCPTxUr0vZw=="
So my Python script attempts to generate the same hash value with the described algorithm as follows:
import hashlib
import base64
import struct
TARGET_HASH = 'pVjR9ktO9vlxijXcMPlH+4PLwD4Xwy1aqbNQOFmWaSpvBjipNh//T8S3nBhq6HRoRVfWL6s/+NdUCPTxUr0vZw=='
TARGET_SALT = 'pH1TDVHSfGBxkd3Q88UNhQ=='
bsalt = base64.b64decode(TARGET_SALT)
def hashit(what, alg='sha512', **kwargs):
if alg == 'sha1':
return hashlib.sha1(what)
elif alg == 'sha512':
return hashlib.sha512(what)
# etc...
else:
raise Exception(f'Unsupported hash algorithm: {alg}')
def gethash(data, salt=None, alg='sha512', iters=100000, base64result=True, returnstring=True):
# encode password in UTF-16LE
# ECMA-376-1:2016 17.15.1.29 (p. 1026)
if isinstance(data, str): data = data.encode('utf-16-le')
# prepend salt if provided
if not salt is None:
if isinstance(salt, str): salt = salt.encode('utf-16-le')
ghash = salt + data
else:
ghash = data
# hash iteratively for 'iters' rounds
for i in range(iters):
try:
# next hash = hash(previous data) + 4-byte integer (previous round number) with LE byte ordering
# ECMA-376-1:2016 17.15.1.29 (p. 1020)
ghash = hashit(ghash, alg).digest() + struct.pack('<I', i)
except Exception as err:
print(err)
break
# remove trailing round number bytes
ghash = ghash[:-4]
# BASE64 encode if requested
if base64result:
ghash = base64.b64encode(ghash)
# return as an ASCII string if requested
if returnstring:
ghash = ghash.decode()
return ghash
But then when I run
print(gethash('johnjohn', bsalt))
I get the following hash which is not equal to the target one:
G47RT4/+JdE6pnrP6MqUKa3JyL8abeYSCX+E4+9J+6shiZqImBJ8M6bb+IMKEdvKd6+9dVnQ3oeOsgQz/aCdcQ==
Could I be wrong in my implementation somewhere or do you think there's a difference in the low-level hash function implementation (Python's hashlib vs. Open XML)?
I realized that Word uses a legacy algorithm to pre-process passwords (for compatibility with older versions). This algorithm is described at length in ECMA-376-1:2016 Part 4 (Transitional Migration Features, #14.8.1 "Legacy Password Hash Algorithm"). So I've managed to make a script that reproduces the official ECMA example:
def strtobytes(s, trunc=15):
b = s.encode('utf-16-le')
# remove BOM symbol if present
if b[0] == 0xfeff: b = b[1:]
pwdlen = min(trunc, len(s))
if pwdlen < 1: return None
return bytes([b[i] or b[i+1] for i in range(0, pwdlen * 2, 2)])
def process_pwd(pwd):
# 1. PREPARE PWD STRING (TRUNCATE, CONVERT TO BYTES)
pw = strtobytes(pwd) if isinstance(pwd, str) else pwd[:15]
pwdlen = len(pw)
# 2. HIGH WORD CALC
HW = InitialCodeArray[pwdlen - 1]
for i in range(pwdlen):
r = 15 - pwdlen + i
for ibit in range(7):
if (pw[i] & (0x0001 << ibit)):
HW ^= EncryptionMatrix[r][ibit]
# 3. LO WORD CALC
LW = 0
for i in reversed(range(pwdlen)):
LW = (((LW >> 14) & 0x0001) | ((LW << 1) & 0x7FFF)) ^ pw[i]
LW = (((LW >> 14) & 0x0001) | ((LW << 1) & 0x7FFF)) ^ pwdlen ^ 0xCE4B
# 4. COMBINE AND REVERSE
return bytes([LW & 0xff, LW >> 8, HW & 0xff, HW >> 8])
So when I do process_pwd('Example')
I get what's said in the ECMA (0x7EEDCE64
). The hashing function was also modified (the initial SALT + HASH should not be included in the main iteration loop, as I found on a forum):
def gethash(data, salt=None, alg='sha512', iters=100000, base64result=True, returnstring=True):
def hashit(what, alg='sha512'):
return getattr(hashlib, alg)(what)
# encode password with legacy algorithm if a string is given
if isinstance(data, str):
data = process_pwd(data)
if data is None:
print('WRONG PASSWORD STRING!')
return None
# prepend salt if provided
if not salt is None:
if isinstance(salt, str):
salt = process_pwd(salt)
if salt is None:
print('WRONG SALT STRING!')
return None
ghash = salt + data
else:
ghash = data
# initial hash (salted)
ghash = hashit(ghash, alg).digest()
# hash iteratively for 'iters' rounds
for i in range(iters):
try:
# next hash = hash(previous data + 4-byte integer (previous round number) with LE byte ordering)
# ECMA-376-1:2016 17.15.1.29 (p. 1020)
ghash = hashit(ghash + struct.pack('<I', i), alg).digest()
except Exception as err:
print(err)
return None
# BASE64 encode if requested
if base64result:
ghash = base64.b64encode(ghash)
# return as an ASCII string if requested
if returnstring:
ghash = ghash.decode()
return ghash
However many times I've re-checked this code, I couldn't see any more errors. But I still can't reproduce the target hash in the test Word document:
myhash = gethash('johnjohn', base64.b64decode('pH1TDVHSfGBxkd3Q88UNhQ=='))
print(myhash)
print(TARGET_HASH == myhash)
I get:
wut2VOpT+X8pKXky6u/+YtwRX2inDv1WVC8FtZcdxKsyX0gHNBJGYwBgV8xzq7Rke/hWMfWe9JVvqDQAZ11A5w==
False
Returning to this question, I've updated my Python code adapting the detailed answer below (thanks @Andrew O!). My full code is now as follows:
# coding: utf-8
import hashlib
import base64
TARGET_HASH = 'pVjR9ktO9vlxijXcMPlH+4PLwD4Xwy1aqbNQOFmWaSpvBjipNh//T8S3nBhq6HRoRVfWL6s/+NdUCPTxUr0vZw=='
TARGET_SALT = 'pH1TDVHSfGBxkd3Q88UNhQ=='
HighOrderWords = [
[0xE1, 0xF0],
[0x1D, 0x0F],
[0xCC, 0x9C],
[0x84, 0xC0],
[0x11, 0x0C],
[0x0E, 0x10],
[0xF1, 0xCE],
[0x31, 0x3E],
[0x18, 0x72],
[0xE1, 0x39],
[0xD4, 0x0F],
[0x84, 0xF9],
[0x28, 0x0C],
[0xA9, 0x6A],
[0x4E, 0xC3]
]
EncryptionMatrix = [
[[0xAE, 0xFC], [0x4D, 0xD9], [0x9B, 0xB2], [0x27, 0x45], [0x4E, 0x8A], [0x9D, 0x14], [0x2A, 0x09]],
[[0x7B, 0x61], [0xF6, 0xC2], [0xFD, 0xA5], [0xEB, 0x6B], [0xC6, 0xF7], [0x9D, 0xCF], [0x2B, 0xBF]],
[[0x45, 0x63], [0x8A, 0xC6], [0x05, 0xAD], [0x0B, 0x5A], [0x16, 0xB4], [0x2D, 0x68], [0x5A, 0xD0]],
[[0x03, 0x75], [0x06, 0xEA], [0x0D, 0xD4], [0x1B, 0xA8], [0x37, 0x50], [0x6E, 0xA0], [0xDD, 0x40]],
[[0xD8, 0x49], [0xA0, 0xB3], [0x51, 0x47], [0xA2, 0x8E], [0x55, 0x3D], [0xAA, 0x7A], [0x44, 0xD5]],
[[0x6F, 0x45], [0xDE, 0x8A], [0xAD, 0x35], [0x4A, 0x4B], [0x94, 0x96], [0x39, 0x0D], [0x72, 0x1A]],
[[0xEB, 0x23], [0xC6, 0x67], [0x9C, 0xEF], [0x29, 0xFF], [0x53, 0xFE], [0xA7, 0xFC], [0x5F, 0xD9]],
[[0x47, 0xD3], [0x8F, 0xA6], [0x0F, 0x6D], [0x1E, 0xDA], [0x3D, 0xB4], [0x7B, 0x68], [0xF6, 0xD0]],
[[0xB8, 0x61], [0x60, 0xE3], [0xC1, 0xC6], [0x93, 0xAD], [0x37, 0x7B], [0x6E, 0xF6], [0xDD, 0xEC]],
[[0x45, 0xA0], [0x8B, 0x40], [0x06, 0xA1], [0x0D, 0x42], [0x1A, 0x84], [0x35, 0x08], [0x6A, 0x10]],
[[0xAA, 0x51], [0x44, 0x83], [0x89, 0x06], [0x02, 0x2D], [0x04, 0x5A], [0x08, 0xB4], [0x11, 0x68]],
[[0x76, 0xB4], [0xED, 0x68], [0xCA, 0xF1], [0x85, 0xC3], [0x1B, 0xA7], [0x37, 0x4E], [0x6E, 0x9C]],
[[0x37, 0x30], [0x6E, 0x60], [0xDC, 0xC0], [0xA9, 0xA1], [0x43, 0x63], [0x86, 0xC6], [0x1D, 0xAD]],
[[0x33, 0x31], [0x66, 0x62], [0xCC, 0xC4], [0x89, 0xA9], [0x03, 0x73], [0x06, 0xE6], [0x0D, 0xCC]],
[[0x10, 0x21], [0x20, 0x42], [0x40, 0x84], [0x81, 0x08], [0x12, 0x31], [0x24, 0x62], [0x48, 0xC4]]
]
def hashit(what, alg='sha1', **kwargs):
f = getattr(hashlib, alg, None)
if f is None:
raise Exception(f'Unsupported hash algorithm: {alg}')
return f(what)
def strtobytes(s, trunc=15):
b = s.encode('utf-16-le')
# remove BOM symbol if present
if b[0] == 0xfeff: b = b[1:]
pwdlen = min(trunc, len(s))
if pwdlen < 1: return None
return bytearray([b[i] or b[i+1] for i in range(0, pwdlen * 2, 2)])
def generate_hash(password: str, salt: bytes = None, alg: str = 'sha512', iters: int = 100000, base64result=True, returnstring=True):
"""
Algorithm given in ECMA-374, 1st Edition, December 2006
https://www.ecma-international.org/wp-content/uploads/ecma-376_first_edition_december_2006.zip
Alternatively: https://c-rex.net/projects/samples/ooxml/e1/Part4/OOXML_P4_DOCX_documentProtection_topic_ID0EJVTX.html
"""
# Truncate the password to 15 characters
passwordBytes = strtobytes(password)
# Obtain the high-order word from the magic list based on the length of the password.
# If the password is 0 length, it's just two zero bytes
passwordLength = len(passwordBytes)
highOrderWord = bytearray([0, 0])
# For each byte in the password, grab the bits based on its position in the encryption matrix
# (taking care that the last character always corresponds to the last row,
# the first part of the matrix may be unused if the password is shorter than 15 bytes).
# For the first to seventh bit, if it's set, do a XOR operation with the current value of the high order word.
# Repeat for each character.
if passwordLength > 0:
highOrderWord = bytearray(HighOrderWords[passwordLength - 1])
for i in range(passwordLength):
passwordByte = passwordBytes[i]
m = i + 15 - passwordLength
for j in range(7):
if (passwordByte & j) == 0:
continue
for k in range(2):
highOrderWord[k] ^= EncryptionMatrix[m][j][k]
# Grab a low order word (2 bytes) and initialize to zero
lowOrderWord = 0
# Perform the operation on each character, starting from the last character in the password and working forwards:
# low-order word = ( ((low-order word >> 14) AND 0x0001) | (low-order word << 1) & 0x7FFF)) ^ character (byte)
for i in reversed(range(passwordLength)):
passwordByte = passwordBytes[i]
lowOrderWord = ( ((lowOrderWord >> 14) & 1) | ((lowOrderWord << 1) & 0x7FFF) ) ^ passwordByte
# Then do low-order word = (((low-order word >> 14) & 0x0001) | (low-order word << 1) & 0x7FFF)) ^ password length ^ 0xCE4B
lowOrderWord = ( ((lowOrderWord >> 14) & 1) | ((lowOrderWord << 1) & 0x7FFF) ) ^ passwordLength ^ 0xCE4B
lowOrderWord = lowOrderWord.to_bytes(2, 'big')
# Form the key by appending the low order word to the high order word, then reverse the byte ordering
key = (highOrderWord + lowOrderWord)[::-1]
# For some reason, Microsoft Word then uses the Unicode hex representation of the above key,
# then back converts that representation into bytes
# In Word, an additional third stage is added to the process of hashing and storing a user supplied password.
# In this third stage, the reversed byte order legacy hash from the second stage shall be converted to Unicode hex string representation
# [Example: If the single byte string 7EEDCE64 is converted to Unicode hex string it will be represented in memory as the following byte stream:
# 37 00 45 00 45 00 44 00 43 00 45 00 36 00 34 00], and that value shall be hashed as defined by the attribute values
# https://learn.microsoft.com/en-us/openspecs/office_standards/ms-oe376/fb220a2f-88d4-488c-a9b7-e094756b6699
key = ''.join('{:02x}'.format(x) for x in key).replace('-', '').encode('utf-8')
computedHash = bytearray(key)
# Now compute the hash once by prepending the salt bytes to the result from above.
# If there are no salt bytes, just skip this step
if salt:
computedHash = bytearray(salt) + key
# Word requires that the initial hash of the password with the salt not be considered in the count
computedHash = bytearray(hashit(computedHash, alg).digest())
# If there are iterations to compute, for each iteration, convert the iteration count (0-base) to a 32-bit (4 byte) integer (little endian),
# and (documentation wasn't clear on this, it just said to "add" the bytes - but to align with the output I had to append it) append this to the current computed hash.
# Apply the requested hash algorithm (Word seems to default to SHA512, but from testing I saw that it handles the other options fine as well)
for i in range(iters):
# ISO/IEC 29500-1 Fourth Edition, 2016-11-01
# 17.15.1.29 - spinCount
# Specifies the number of times the hashing function shall be iteratively run
# (runs using each iteration's result plus a 4 byte value (0-based, little endian) containing the number of the iteration
# as the input for the next iteration) when attempting to compare a user-supplied password with the value stored in the hashValue attribute
computedHash += i.to_bytes(4, 'little')
computedHash = bytearray(hashit(computedHash, alg).digest())
# Return the above as a base-64 encoded string. This is what goes in the documentProtection attribute.
# BASE64 encode if requested
if base64result:
computedHash = base64.b64encode(computedHash)
# return as an ASCII string if requested
if returnstring:
computedHash = computedHash.decode('utf-8')
return computedHash
# -------------------------------------------------------------------- #
if __name__ == '__main__':
myhash = generate_hash('johnjohn', base64.b64decode(TARGET_SALT))
print(myhash)
print(TARGET_HASH == myhash)
But ALAS! -- still assertion fails. Which means I'm getting something wrong here... Who can help adapt the C# to Python 1:1?
Upvotes: 3
Views: 1286
Reputation: 11
I took a look at Andrew O's implementation and below I provide another Python implementation, albeit without customization (the choice of the hashing algorithm, etc.). With this code, I was able to obtain hashes identical to Andrew O's implementation.
I put comments where the logic differs from s0mbre's code.
Note that this implementation fails for passwords containing non-ASCII characters (characters with code 128-255). While the resulting hash is identical to the one generated via the Andrew O's implementation, the hash generated by Word is different for such passwords.
high_order_word_lists = [
# See s0mbre's code: https://stackoverflow.com/q/65877620/23666763
# ...
]
encryption_matrix = [
# See s0mbre's code: https://stackoverflow.com/q/65877620/23666763
# ...
]
def generate_hash(password, salt_bytes, num_iterations):
password_bytes = password.encode('utf-8')
password_bytes = password_bytes[:15]
password_length = len(password_bytes)
if password_length > 0:
high_order_word_list = high_order_word_lists[password_length - 1].copy()
else:
high_order_word_list = [0x00, 0x00]
for i in range(password_length):
password_byte = password_bytes[i]
matrix_index = i + len(encryption_matrix) - password_length
for j in range(len(encryption_matrix[0])):
# Only perform XOR operation using the encryption matrix if the j-th bit is set
mask = 1 << j
if (password_byte & mask) == 0:
continue
for k in range(len(encryption_matrix[0][0])):
high_order_word_list[k] = high_order_word_list[k] ^ encryption_matrix[matrix_index][j][k]
low_order_word = 0x0000
for i in range(password_length - 1, -1, -1):
password_byte = password_bytes[i]
low_order_word = (
(((low_order_word >> 14) & 0x0001) | ((low_order_word << 1) & 0x7fff))
^ password_byte
)
low_order_word = (
(((low_order_word >> 14) & 0x0001) | ((low_order_word << 1) & 0x7fff))
^ password_length
^ 0xce4b
)
low_order_word_list = [(low_order_word & 0xff00) >> 8, low_order_word & 0x00ff]
key = high_order_word_list + low_order_word_list
key.reverse()
# `key_str` is a hex string with uppercase hexadecimal letters, e.g. '7EEDCE64'
key_str = ''.join(f'{c:X}' for c in key)
key_bytes = key_str.encode('utf-16le')
password_hash = salt_bytes + key_bytes
password_hash = hashlib.sha512(password_hash).digest()
for i in range(num_iterations):
password_hash += i.to_bytes(4, byteorder='little')
password_hash = hashlib.sha512(password_hash).digest()
return base64.b64encode(password_hash).decode('utf-8')
Upvotes: 1
Reputation: 68
I tried to implement the method VcSaJen explained in his answer and it worked for me (my use case is creating a hash that works in ms word), so just posting the code here if it helps someone
import hashlib
import os
import base64
def docx_sha512_hash(password, spin_count=0):
salt = os.urandom(16)
password_bytes = password.encode("utf-16le")
combined_bytes = salt + password_bytes
hashed_password = hashlib.sha512(combined_bytes).digest()
for i in range(spin_count):
index_bytes = i.to_bytes(4, byteorder="little")
hashed_password += index_bytes
hashed_password = hashlib.sha512(hashed_password).digest()
b64_hash = base64.b64encode(hashed_password).decode("utf-8")
b64_salt = base64.b64encode(salt).decode("utf-8")
return b64_hash, b64_salt
then add w:documentProtection w:edit="readOnly" w:enforcement="1" w:algorithmName="SHA-512" w:spinCount="100000" w:hashValue="<hashValue>" w:saltValue="<saltValue>"
to the xml and create the docx
Upvotes: 2
Reputation: 115
You said that you use Word 2019. In this case, you can use newer syntax from MS Word 2010+:
cryptProviderType
/cryptAlgorithmClass
/cryptAlgorithmType
- not used, remove them.
cryptAlgorithmSid
- replace with algorithmName
(and "SHA-512"
instead of "14"
).
cryptSpinCount
- replace with spinCount
hash
- replace with hashValue
salt
- replace with saltValue
Voila! Encode bytes in Base64 to store them in XML.
MS Word will read those files just fine, you don't have to change any settings. But if you want to create files with newer syntax in the Word itself, you can create DWORD
Registry Key in HKCU/Software/Microsoft/Office/<VERSION>/Common/Security/UseIsoPasswordVerifier
with value 1
. Delete key if you want to revert back to compatibility mode.
Upvotes: 3
Reputation: 53
Had to look at this today too and managed to reverse engineer it.
In plain English, the steps are:
low-order word = (((low-order word >> 14) AND 0x0001) | (low-order word << 1) & 0x7FFF)) ^ character (byte)
(<<, >> are the bit shift left and right operators. |, &, ^ the bitwise or, and, and exclusive or respectively.)low-order word = (((low-order word >> 14) & 0x0001) | (low-order word << 1) & 0x7FFF)) ^ password length ^ 0xCE4B.
Here's my implementation in C# (NuGet):
/// <summary>
/// Class that generates hashes suitable for use with OpenXML Wordprocessing ML documents with the documentProtection element.
/// </summary>
public class WordprocessingMLDocumentProtectionHashGenerator
{
private static readonly byte[][] HighOrderWords = new byte[][]
{
new byte[] { 0xE1, 0xF0 },
new byte[] { 0x1D, 0x0F },
new byte[] { 0xCC, 0x9C },
new byte[] { 0x84, 0xC0 },
new byte[] { 0x11, 0x0C },
new byte[] { 0x0E, 0x10 },
new byte[] { 0xF1, 0xCE },
new byte[] { 0x31, 0x3E },
new byte[] { 0x18, 0x72 },
new byte[] { 0xE1, 0x39 },
new byte[] { 0xD4, 0x0F },
new byte[] { 0x84, 0xF9 },
new byte[] { 0x28, 0x0C },
new byte[] { 0xA9, 0x6A },
new byte[] { 0x4E, 0xC3 }
};
private static readonly byte[,,] EncryptionMatrix = new byte[,,]
{
{ { 0xAE, 0xFC }, { 0x4D, 0xD9 }, { 0x9B, 0xB2 }, { 0x27, 0x45 }, { 0x4E, 0x8A }, { 0x9D, 0x14 }, { 0x2A, 0x09 } },
{ { 0x7B, 0x61 }, { 0xF6, 0xC2 }, { 0xFD, 0xA5 }, { 0xEB, 0x6B }, { 0xC6, 0xF7 }, { 0x9D, 0xCF }, { 0x2B, 0xBF } },
{ { 0x45, 0x63 }, { 0x8A, 0xC6 }, { 0x05, 0xAD }, { 0x0B, 0x5A }, { 0x16, 0xB4 }, { 0x2D, 0x68 }, { 0x5A, 0xD0 } },
{ { 0x03, 0x75 }, { 0x06, 0xEA }, { 0x0D, 0xD4 }, { 0x1B, 0xA8 }, { 0x37, 0x50 }, { 0x6E, 0xA0 }, { 0xDD, 0x40 } },
{ { 0xD8, 0x49 }, { 0xA0, 0xB3 }, { 0x51, 0x47 }, { 0xA2, 0x8E }, { 0x55, 0x3D }, { 0xAA, 0x7A }, { 0x44, 0xD5 } },
{ { 0x6F, 0x45 }, { 0xDE, 0x8A }, { 0xAD, 0x35 }, { 0x4A, 0x4B }, { 0x94, 0x96 }, { 0x39, 0x0D }, { 0x72, 0x1A } },
{ { 0xEB, 0x23 }, { 0xC6, 0x67 }, { 0x9C, 0xEF }, { 0x29, 0xFF }, { 0x53, 0xFE }, { 0xA7, 0xFC }, { 0x5F, 0xD9 } },
{ { 0x47, 0xD3 }, { 0x8F, 0xA6 }, { 0x0F, 0x6D }, { 0x1E, 0xDA }, { 0x3D, 0xB4 }, { 0x7B, 0x68 }, { 0xF6, 0xD0 } },
{ { 0xB8, 0x61 }, { 0x60, 0xE3 }, { 0xC1, 0xC6 }, { 0x93, 0xAD }, { 0x37, 0x7B }, { 0x6E, 0xF6 }, { 0xDD, 0xEC } },
{ { 0x45, 0xA0 }, { 0x8B, 0x40 }, { 0x06, 0xA1 }, { 0x0D, 0x42 }, { 0x1A, 0x84 }, { 0x35, 0x08 }, { 0x6A, 0x10 } },
{ { 0xAA, 0x51 }, { 0x44, 0x83 }, { 0x89, 0x06 }, { 0x02, 0x2D }, { 0x04, 0x5A }, { 0x08, 0xB4 }, { 0x11, 0x68 } },
{ { 0x76, 0xB4 }, { 0xED, 0x68 }, { 0xCA, 0xF1 }, { 0x85, 0xC3 }, { 0x1B, 0xA7 }, { 0x37, 0x4E }, { 0x6E, 0x9C } },
{ { 0x37, 0x30 }, { 0x6E, 0x60 }, { 0xDC, 0xC0 }, { 0xA9, 0xA1 }, { 0x43, 0x63 }, { 0x86, 0xC6 }, { 0x1D, 0xAD } },
{ { 0x33, 0x31 }, { 0x66, 0x62 }, { 0xCC, 0xC4 }, { 0x89, 0xA9 }, { 0x03, 0x73 }, { 0x06, 0xE6 }, { 0x0D, 0xCC } },
{ { 0x10, 0x21 }, { 0x20, 0x42 }, { 0x40, 0x84 }, { 0x81, 0x08 }, { 0x12, 0x31 }, { 0x24, 0x62 }, { 0x48, 0xC4 } }
};
/// <summary>
/// Generates a base-64 string according to the Wordprocessing ML Document DocumentProtection security algorithm.
/// </summary>
/// <param name="password"></param>
/// <param name="salt"></param>
/// <param name="iterations"></param>
/// <param name="hashAlgorithmName"></param>
/// <returns></returns>
public string GenerateHash(string password, byte[] salt, int iterations, HashAlgorithmName hashAlgorithmName)
{
if (password == null)
{
throw new ArgumentNullException(nameof(password));
}
// Algorithm given in ECMA-374, 1st Edition, December 2006
// https://www.ecma-international.org/wp-content/uploads/ecma-376_first_edition_december_2006.zip
// Alternatively: https://c-rex.net/projects/samples/ooxml/e1/Part4/OOXML_P4_DOCX_documentProtection_topic_ID0EJVTX.html
byte[] passwordBytes = Encoding.UTF8.GetBytes(password);
passwordBytes = passwordBytes.Take(15).ToArray();
int passwordLength = passwordBytes.Length;
// If the password length is 0, the key is 0.
byte[] highOrderWord = new byte[] { 0x00, 0x00 };
if (passwordLength > 0)
{
highOrderWord = HighOrderWords[passwordLength - 1].ToArray();
}
for (int i = 0; i < passwordLength; i++)
{
byte passwordByte = passwordBytes[i];
int encryptionMatrixIndex = i + (EncryptionMatrix.GetLength(0) - passwordLength);
BitArray bitArray = passwordByte.ToBitArray();
for (int j = 0; j < EncryptionMatrix.GetLength(1); j++)
{
bool isSet = bitArray[j];
if (isSet)
{
for (int k = 0; k < EncryptionMatrix.GetLength(2); k++)
{
highOrderWord[k] = (byte)(highOrderWord[k] ^ EncryptionMatrix[encryptionMatrixIndex, j, k]);
}
}
}
}
byte[] lowOrderWord = new byte[] { 0x00, 0x00 };
BitSequence lowOrderBitSequence = lowOrderWord.ToBitSequence();
BitSequence bitSequence1 = new byte[] { 0x00, 0x01 }.ToBitSequence();
BitSequence bitSequence7FFF = new byte[] { 0x7F, 0xFF }.ToBitSequence();
for (int i = passwordLength - 1; i >= 0; i--)
{
byte passwordByte = passwordBytes[i];
lowOrderBitSequence = (((lowOrderBitSequence >> 14) & bitSequence1) | ((lowOrderBitSequence << 1) & bitSequence7FFF)) ^ new byte[] { 0x00, passwordByte }.ToBitSequence();
}
lowOrderBitSequence = (((lowOrderBitSequence >> 14) & bitSequence1) | ((lowOrderBitSequence << 1) & bitSequence7FFF)) ^ new byte[] { 0x00, (byte)passwordLength }.ToBitSequence() ^ new byte[] { 0xCE, 0x4B }.ToBitSequence();
lowOrderWord = lowOrderBitSequence.ToByteArray();
byte[] key = highOrderWord.Concat(lowOrderWord).ToArray();
key = key.Reverse().ToArray();
// https://learn.microsoft.com/en-us/openspecs/office_standards/ms-oe376/fb220a2f-88d4-488c-a9b7-e094756b6699
// In Word, an additional third stage is added to the process of hashing and storing a user supplied password. In this third stage, the reversed byte order legacy hash from the second stage shall be converted to Unicode hex string representation [Example: If the single byte string 7EEDCE64 is converted to Unicode hex string it will be represented in memory as the following byte stream: 37 00 45 00 45 00 44 00 43 00 45 00 36 00 34 00. end example], and that value shall be hashed as defined by the attribute values.
key = Encoding.Unicode.GetBytes(BitConverter.ToString(key).Replace("-", string.Empty));
HashAlgorithm hashAlgorithm = hashAlgorithmName.Create();
byte[] computedHash = key;
if (salt != null)
{
computedHash = salt.Concat(key).ToArray();
}
// Word requires that the initial hash of the password with the salt not be considered in the count.
computedHash = hashAlgorithm.ComputeHash(computedHash);
for (int i = 0; i < iterations; i++)
{
// ISO/IEC 29500-1 Fourth Edition, 2016-11-01
// 17.15.1.29 - spinCount
// Specifies the number of times the hashing function shall be iteratively run (runs using each iteration''s result plus a 4 byte value (0-based, little endian) containing the number of the iteration as the input for the next iteration) when attempting to compare a user-supplied password with the value stored in the hashValue attribute.
byte[] iterationBytes = BitConverter.GetBytes(i);
computedHash = computedHash.Concat(iterationBytes).ToArray();
computedHash = hashAlgorithm.ComputeHash(computedHash);
}
return Convert.ToBase64String(computedHash);
}
}
I tested it with your example hash and checked that it passed:
[TestClass]
[TestCategory("WordprocessingMLDocumentProtectionHashGenerator")]
public class WordprocessingMLDocumentProtectionHashGeneratorTests
{
[TestMethod]
public void GeneratesKnownHashes()
{
WordprocessingMLDocumentProtectionHashGenerator wordprocessingMLDocumentProtectionHashGenerator = new WordprocessingMLDocumentProtectionHashGenerator();
Assert.AreEqual("sstT7oPzpUQTchSUE6WbidCrZv1c8k+/5D1Pm+weZt7QoaeSnBEg/cZFg2W+1eohg1mgXGXLci1CWbnbHDYsXQ==", wordprocessingMLDocumentProtectionHashGenerator.GenerateHash("Example", Convert.FromBase64String("KPr2WqWFihenPDtAmpqUtw=="), 100000, HashAlgorithmName.SHA512));
Assert.AreEqual("uBuZhlyVTOQtRwQuOGjY7GU3FnJbe1VFKvN+j9u27HSbthOY+n1/daU/WCkqV40fG6HxX+pxgR+Ow4ZvAE7aZg==", wordprocessingMLDocumentProtectionHashGenerator.GenerateHash("Password", Convert.FromBase64String("On9D022mrdqvHTb6eEkFGA=="), 100000, HashAlgorithmName.SHA512));
Assert.AreEqual("mkGbBri0a1icL1nJKTQL7PyLUY2Uei2wyMHC0Y6s1+DOMYvPWdB6cy0Npao15O0+yqtyZW4hAP0+dcdyrEk7qg==", wordprocessingMLDocumentProtectionHashGenerator.GenerateHash("Password", Convert.FromBase64String("On9D022mrdqvHTb6eEkFGA=="), 0, HashAlgorithmName.SHA512));
Assert.AreEqual("qdPI8cSBM/21Mr29mfFrR6l7hIn8oLKKT1nTDXHsAQA=", wordprocessingMLDocumentProtectionHashGenerator.GenerateHash("Testerman", Convert.FromBase64String("On9D022mrdqvHTb6eEkFGA=="), 100000, HashAlgorithmName.SHA256));
Assert.AreEqual("d5FZvHnQhm6Mzqy6cYE7ZbniYXA/8qJxkAze0sFcNirWYhaLpScmSsfBHptuEmuBreLuNjyV5IjdUoOFWM9mbQ==", wordprocessingMLDocumentProtectionHashGenerator.GenerateHash("Password", null, 100000, HashAlgorithmName.SHA512));
Assert.AreEqual("pVjR9ktO9vlxijXcMPlH+4PLwD4Xwy1aqbNQOFmWaSpvBjipNh//T8S3nBhq6HRoRVfWL6s/+NdUCPTxUr0vZw==", wordprocessingMLDocumentProtectionHashGenerator.GenerateHash("johnjohn", Convert.FromBase64String("pH1TDVHSfGBxkd3Q88UNhQ=="), 100000, HashAlgorithmName.SHA512));
}
}
Upvotes: 4