Thananthip.S
Thananthip.S

Reputation: 21

Spark Scala : Got error javax.crypto.AEADBadTagException: Tag mismatch! from AES GCM binary file decryption

Hi I'm trying to decrypt the zip file using sc.BinaryFiles

Scala version 2.11.12 Spark version 2.3

Here is my code

val input_bin = spark.sparkContext.binaryFiles("file.csv.gz.enc")
val input_utf = input_bin.map{f => f._2}.collect()(0).open().readUTF()
val input_base64 =  Base64.decodeBase64(input_utf)

val GCM_NONCE_LENGTH = 12
val GCM_TAG_LENGTH = 16
val key = keyToSpec("<key>", 32, "AES", "UTF-8", Some("SHA-256"))

val cipher = Cipher.getInstance("AES/GCM/NoPadding")
val nonce = input_base64.slice(0, GCM_NONCE_LENGTH)

val spec = new GCMParameterSpec(128, nonce)
cipher.init(Cipher.DECRYPT_MODE, key, spec)
cipher.doFinal(input_base64)

def keyToSpec(key: String, keyLengthByte: Int, encryptAlgorithm: String,
                keyEncode:String = "UTF-8", digestMethod: Option[String] = Some("SHA-1")): SecretKeySpec = {
    //prepare key for encrypt/decrypt
    var keyBytes: Array[Byte] = key.getBytes(keyEncode)

    if (digestMethod.isDefined) {
      val sha: MessageDigest = MessageDigest.getInstance(digestMethod.get)
      keyBytes = sha.digest(keyBytes)
      keyBytes = util.Arrays.copyOf(keyBytes, keyLengthByte)
    }

    new SecretKeySpec(keyBytes, encryptAlgorithm)
  }

And I found error

javax.crypto.AEADBadTagException: Tag mismatch!
  at com.sun.crypto.provider.GaloisCounterMode.decryptFinal(GaloisCounterMode.java:571)
  at com.sun.crypto.provider.CipherCore.finalNoPadding(CipherCore.java:1046)
  at com.sun.crypto.provider.CipherCore.doFinal(CipherCore.java:983)
  at com.sun.crypto.provider.CipherCore.doFinal(CipherCore.java:845)
  at com.sun.crypto.provider.AESCipher.engineDoFinal(AESCipher.java:446)
  at javax.crypto.Cipher.doFinal(Cipher.java:2165)
  ... 50 elided

The input file is quite big (around 2GB).

I'm not sure, I read file incorrectly or the decryption has something wrong.

I also have the Python version and this code is work fine for me

real_key = key
hash_key = SHA256.new()
hash_key.update(real_key)

for filename in os.listdir(kwargs['enc_path']):
    if (re.search("\d{12}.csv.gz.enc$", filename)) and (dec_date in filename):
        with open(os.path.join(kwargs['enc_path'], filename)) as f:
            content = f.read()
            f.close()
            ct = base64.b64decode(content)
            nonce, tag = ct[:12], ct[-16:]
            cipher = AES.new(hash_key.digest(), AES.MODE_GCM, nonce)
            dec = cipher.decrypt_and_verify(ct[12:-16], tag)
            decrypted_data = gzip.decompress(dec).decode('utf-8')

Any suggestion?

Thank you

Update #1

I was able to resolved the problem, by change the file reading method from Spark to be local file reading (scala.io) and add the AAD for decryption also apply the answers from @Topaco and @blackbishop

cipher.init(Cipher.DECRYPT_MODE, key, spec)
cipher.updateAAD(Array[Byte]()) // AAD can be changed to be any value but must be exact the same value when file is encrypted
cipher.doFinal(input_base64.drop(12)) // drop nonce before decrypt

I'm still finding out, why Spark doesn't work

Upvotes: 2

Views: 580

Answers (1)

blackbishop
blackbishop

Reputation: 32650

You are not using the same decryption method as in Python. In the scala code, you are ignoring the tag and when you call doFinal you're passing all the cipher.

Try with these changes :

// specify the tag length when creating GCMParameterSpec
val spec = new GCMParameterSpec(GCM_TAG_LENGTH, nonce)

// remove the nonce part from the cipher before calling doFinal
val dec = cipher.doFinal(input_base64.drop(GCM_NONCE_LENGTH))

Upvotes: 1

Related Questions