hotmeatballsoup
hotmeatballsoup

Reputation: 605

Python GNUPG Unknown system error when loading private key

Please note: Even though I mention Azure Databricks here, I believe this is a Python/GNUPG problem at heart, and as such, can be answered by anybody with Python/GNUPG encryption experience.


I have the following Python code in my Azure Databricks notebook:

%python

from pyspark.sql import SparkSession
from pyspark.sql.functions import input_file_name, lit
from pyspark.sql.types import StringType
import os
import gnupg
from azure.storage.blob import BlobServiceClient, BlobPrefix
import hashlib
from pyspark.sql import Row
from pyspark.sql.functions import collect_list

# Initialize Spark session
spark = SparkSession.builder.appName("DecryptData").getOrCreate()

storage_account_name = "mycontainer"
storage_account_key = "<redacted>"
spark.conf.set(f"fs.azure.account.key.{storage_account_name}.blob.core.windows.net", storage_account_key)

clientsDF = spark.read.table("myapp.internal.Clients")
row = clientsDF.first()
clientsLabel = row["Label"]
encryptedFilesSource = f"wasbs://{clientsLabel}@mycontainer.blob.core.windows.net/data/*"

decryptedDF = spark.sql(f"""
SELECT
  REVERSE(SUBSTRING_INDEX(REVERSE(input_file_name()), '/', 1)) AS FileName,
  REPLACE(value, '"', '[Q]') AS FileData,
  '{clientsLabel}' as ClientLabel
FROM
  read_files(
    '{encryptedFilesSource}',
    format => 'text',
    wholeText => true
  )
""")

decryptedDF.show()
decryptedDF = decryptedDF.select("FileData");
encryptedData = decryptedDF.first()['FileData']

def decrypt_pgp_data(encrypted_data, private_key_data, passphrase):
    # Initialize GPG object
    gpg = gnupg.GPG()

    print("Loading private key...")

    # Load private key
    private_key = gpg.import_keys(private_key_data)
    if private_key.count == 1:
        keyid = private_key.fingerprints[0]
        gpg.trust_keys(keyid, 'TRUST_ULTIMATE')    
    print("Private key loaded, attempting decryption...")

    try:
        decrypted_data = gpg.decrypt(encrypted_data, passphrase=passphrase, always_trust=True)
    except Exception as e:
        print("Error during decryption:", e)
        return
    
    print("Decryption finished and decrypted_data is of type: " + str(type(decrypted_data)))

    if decrypted_data.ok:
        print("Decryption successful!")
        print("Decrypted Data:")
        print(decrypted_data.data.decode())
    else:
        print("Decryption failed.")
        print("Status:", decrypted_data.status)
        print("Error:", decrypted_data.stderr)
        print("Trust Level:", decrypted_data.trust_text)
        print("Valid:", decrypted_data.valid)


private_key_data = '''-----BEGIN PGP PRIVATE KEY BLOCK-----

<redacted>

-----END PGP PRIVATE KEY BLOCK-----'''

passphrase = '<redacted>'

encrypted_data = b'encryptedData'

decrypt_pgp_data(encrypted_data, private_key_data, passphrase)

As you can see, I am reading PGP-encrypted files from an Azure Blob Storage account container into a Dataframe, and then sending the first row (I'll change this notebook to work on all rows later) through a decrypter function that uses GNUPG.

When this runs it gives me the following output in the driver logs:

+--------------------+--------------------+-------+
|      FileName|            FileData| ClientLabel |
+--------------------+--------------------+-------+
|      fizz.pgp|���mIj�h�#{... |         acme|
+--------------------+--------------------+-------+

Decrypting: <redacted>
Loading private key...
WARNING:gnupg:gpg returned a non-zero error code: 2
Private key loaded, attempting decryption...
Decryption finished and decrypted_data is of type: <class 'gnupg.Crypt'>
Decryption failed.
Status: no data was provided
Error: gpg: no valid OpenPGP data found.
[GNUPG:] NODATA 1
[GNUPG:] NODATA 2
[GNUPG:] FAILURE decrypt 4294967295
gpg: decrypt_message failed: Unknown system error

Trust Level: None
Valid: False

Can anyone spot why decryption is failing, or help me troubleshoot it to pin down the culprit? Setting a debugger is not an option since this is happening inside a notebook. I'm thinking:

  1. Perhaps I'm using the GNUPG API completely wrong
  2. Perhaps there's something malformed or improperly formatted with the private key I'm reading in from an in-memory string variable
  3. Perhaps the encrypted data is malformed (I've seen some internet rumblings of endianness causing this type of error)
  4. Maybe GNUPG isn't trusting my private key for some reason

Can anyone spot where I'm going awry?

Upvotes: 0

Views: 910

Answers (2)

hotmeatballsoup
hotmeatballsoup

Reputation: 605

The problem at hand is that Python does not have any modern modules/libraries that can perform PGP decryption without a dependency on the gpg native binary installed and accessible from a shell.

  • Python-GnuPG has this dependency
  • The only other game in town, Python-PGP, is 10 years old (presently) and will not run from an Azure Databricks notebook

I ended up writing a Scala notebook that uses PainlessGPG, although I had to create a custom "fat" (shaded) JAR for all of PainlessPGP's transitive dependencies, and this would not be feasible for any developer who isn't strong with Java.

TL;DR --> Python-based decryption from inside an ADB notebook is not advisable.

Upvotes: 0

larsks
larsks

Reputation: 311606

The problem is not with the python-gnupg module.

In the following example code, we first generate a private key, then encrypt some data with it, and then we passed the key and encrypted data to your decrypt_pgp_data function. Everything seems to work as expected; running the below code results in:

gpg: keybox '/tmp/tmpng8xm_d_/pubring.kbx' created
gpg: /tmp/tmpng8xm_d_/trustdb.gpg: trustdb created
gpg: directory '/tmp/tmpng8xm_d_/openpgp-revocs.d' created
gpg: revocation certificate stored as '/tmp/tmpng8xm_d_/openpgp-revocs.d/8DF4D8326BAD790E37B75C8A66F05BDC77FAF5BE.rev'
gpg: checking the trustdb
gpg: marginals needed: 3  completes needed: 1  trust model: pgp
gpg: depth: 0  valid:   1  signed:   0  trust: 0-, 0q, 0n, 0m, 0f, 1u
Loading private key...
Private key loaded, attempting decryption...
Decryption finished and decrypted_data is of type: <class 'gnupg.Crypt'>
Decryption successful!
Decrypted Data:
This is a test

This suggests that the problem is in how you are generating the encrypted data or the private key, but since you don't show that process in your question it's hard to diagnose.

Here's the code:

import os
import tempfile
import subprocess

import gnupg


def decrypt_pgp_data(encrypted_data, private_key_data, passphrase):
    # Initialize GPG object
    gpg = gnupg.GPG()

    print("Loading private key...")

    # Load private key
    private_key = gpg.import_keys(private_key_data)
    if private_key.count != 1:
        raise ValueError("invalid private key")

    keyid = private_key.fingerprints[0]
    gpg.trust_keys(keyid, "TRUST_ULTIMATE")
    print("Private key loaded, attempting decryption...")

    try:
        decrypted_data = gpg.decrypt(
            encrypted_data, passphrase=passphrase, always_trust=True
        )
    except Exception as e:
        print("Error during decryption:", e)
        return

    print(
        "Decryption finished and decrypted_data is of type: "
        + str(type(decrypted_data))
    )

    if decrypted_data.ok:
        print("Decryption successful!")
        print("Decrypted Data:")
        print(decrypted_data.data.decode())
    else:
        print("Decryption failed.")
        print("Status:", decrypted_data.status)
        print("Error:", decrypted_data.stderr)
        print("Trust Level:", decrypted_data.trust_text)
        print("Valid:", decrypted_data.valid)


passphrase = "secret passphrase"

# Create a temprary directory and use that as GNUPGHOME to avoid mucking
# about with our actual gpg configuration.
with tempfile.TemporaryDirectory() as gnupghome:
    os.environ["GNUPGHOME"] = gnupghome

    # Generate a new private key non-interactively
    genkey = subprocess.Popen(["gpg", "--batch", "--gen-key"], stdin=subprocess.PIPE)
    genkey.communicate(
        input="\n".join(
            [
                "Key-Type: 1",
                "Key-Length: 2048",
                "Subkey-Type: 1",
                "Subkey-Length: 2048",
                "Name-Real: Example User",
                "Name-Email: [email protected]",
                "Expire-Date: 0",
                f"Passphrase: {passphrase}",
            ]
        ).encode()
    )
    genkey.wait()

    # Export the private key.
    private_key_data = subprocess.check_output(
        [
            "gpg",
            "--export-secret-key",
            "-a",
            "--pinentry-mode=loopback",
            f"--passphrase={passphrase}",
            "[email protected]",
        ]
    )

    # Encrypt a sample message with the private key.
    encrypt = subprocess.Popen(
        ["gpg", "-ea", "-r", "[email protected]"],
        stdin=subprocess.PIPE,
        stdout=subprocess.PIPE,
    )
    encrypted_data, _ = encrypt.communicate(input="This is a test".encode())
    encrypt.wait()

# Now we start with a new, empty GNUGPGHOME directory so that we're
# confident that we're successfully importing the private key rather than
# using a key already in our keystore.
with tempfile.TemporaryDirectory() as gnupghome:
    os.environ["GNUPGHOME"] = gnupghome
    decrypt_pgp_data(encrypted_data, private_key_data, passphrase)

Upvotes: 0

Related Questions