Reputation: 21
Hi I'm trying to decrypt the zip file using sc.BinaryFiles
Scala version 2.11.12 Spark version 2.3
Here is my code
val input_bin = spark.sparkContext.binaryFiles("file.csv.gz.enc")
val input_utf = input_bin.map{f => f._2}.collect()(0).open().readUTF()
val input_base64 = Base64.decodeBase64(input_utf)
val GCM_NONCE_LENGTH = 12
val GCM_TAG_LENGTH = 16
val key = keyToSpec("<key>", 32, "AES", "UTF-8", Some("SHA-256"))
val cipher = Cipher.getInstance("AES/GCM/NoPadding")
val nonce = input_base64.slice(0, GCM_NONCE_LENGTH)
val spec = new GCMParameterSpec(128, nonce)
cipher.init(Cipher.DECRYPT_MODE, key, spec)
cipher.doFinal(input_base64)
def keyToSpec(key: String, keyLengthByte: Int, encryptAlgorithm: String,
keyEncode:String = "UTF-8", digestMethod: Option[String] = Some("SHA-1")): SecretKeySpec = {
//prepare key for encrypt/decrypt
var keyBytes: Array[Byte] = key.getBytes(keyEncode)
if (digestMethod.isDefined) {
val sha: MessageDigest = MessageDigest.getInstance(digestMethod.get)
keyBytes = sha.digest(keyBytes)
keyBytes = util.Arrays.copyOf(keyBytes, keyLengthByte)
}
new SecretKeySpec(keyBytes, encryptAlgorithm)
}
And I found error
javax.crypto.AEADBadTagException: Tag mismatch!
at com.sun.crypto.provider.GaloisCounterMode.decryptFinal(GaloisCounterMode.java:571)
at com.sun.crypto.provider.CipherCore.finalNoPadding(CipherCore.java:1046)
at com.sun.crypto.provider.CipherCore.doFinal(CipherCore.java:983)
at com.sun.crypto.provider.CipherCore.doFinal(CipherCore.java:845)
at com.sun.crypto.provider.AESCipher.engineDoFinal(AESCipher.java:446)
at javax.crypto.Cipher.doFinal(Cipher.java:2165)
... 50 elided
The input file is quite big (around 2GB).
I'm not sure, I read file incorrectly or the decryption has something wrong.
I also have the Python version and this code is work fine for me
real_key = key
hash_key = SHA256.new()
hash_key.update(real_key)
for filename in os.listdir(kwargs['enc_path']):
if (re.search("\d{12}.csv.gz.enc$", filename)) and (dec_date in filename):
with open(os.path.join(kwargs['enc_path'], filename)) as f:
content = f.read()
f.close()
ct = base64.b64decode(content)
nonce, tag = ct[:12], ct[-16:]
cipher = AES.new(hash_key.digest(), AES.MODE_GCM, nonce)
dec = cipher.decrypt_and_verify(ct[12:-16], tag)
decrypted_data = gzip.decompress(dec).decode('utf-8')
Any suggestion?
Thank you
Update #1
I was able to resolved the problem, by change the file reading method from Spark to be local file reading (scala.io) and add the AAD for decryption also apply the answers from @Topaco and @blackbishop
cipher.init(Cipher.DECRYPT_MODE, key, spec)
cipher.updateAAD(Array[Byte]()) // AAD can be changed to be any value but must be exact the same value when file is encrypted
cipher.doFinal(input_base64.drop(12)) // drop nonce before decrypt
I'm still finding out, why Spark doesn't work
Upvotes: 2
Views: 580
Reputation: 32650
You are not using the same decryption method as in Python. In the scala code, you are ignoring the tag and when you call doFinal
you're passing all the cipher.
Try with these changes :
// specify the tag length when creating GCMParameterSpec
val spec = new GCMParameterSpec(GCM_TAG_LENGTH, nonce)
// remove the nonce part from the cipher before calling doFinal
val dec = cipher.doFinal(input_base64.drop(GCM_NONCE_LENGTH))
Upvotes: 1