Reputation: 605
Please note: Even though I mention Azure Databricks here, I believe this is a Python/GNUPG problem at heart, and as such, can be answered by anybody with Python/GNUPG encryption experience.
I have the following Python code in my Azure Databricks notebook:
%python
from pyspark.sql import SparkSession
from pyspark.sql.functions import input_file_name, lit
from pyspark.sql.types import StringType
import os
import gnupg
from azure.storage.blob import BlobServiceClient, BlobPrefix
import hashlib
from pyspark.sql import Row
from pyspark.sql.functions import collect_list
# Initialize Spark session
spark = SparkSession.builder.appName("DecryptData").getOrCreate()
storage_account_name = "mycontainer"
storage_account_key = "<redacted>"
spark.conf.set(f"fs.azure.account.key.{storage_account_name}.blob.core.windows.net", storage_account_key)
clientsDF = spark.read.table("myapp.internal.Clients")
row = clientsDF.first()
clientsLabel = row["Label"]
encryptedFilesSource = f"wasbs://{clientsLabel}@mycontainer.blob.core.windows.net/data/*"
decryptedDF = spark.sql(f"""
SELECT
REVERSE(SUBSTRING_INDEX(REVERSE(input_file_name()), '/', 1)) AS FileName,
REPLACE(value, '"', '[Q]') AS FileData,
'{clientsLabel}' as ClientLabel
FROM
read_files(
'{encryptedFilesSource}',
format => 'text',
wholeText => true
)
""")
decryptedDF.show()
decryptedDF = decryptedDF.select("FileData");
encryptedData = decryptedDF.first()['FileData']
def decrypt_pgp_data(encrypted_data, private_key_data, passphrase):
# Initialize GPG object
gpg = gnupg.GPG()
print("Loading private key...")
# Load private key
private_key = gpg.import_keys(private_key_data)
if private_key.count == 1:
keyid = private_key.fingerprints[0]
gpg.trust_keys(keyid, 'TRUST_ULTIMATE')
print("Private key loaded, attempting decryption...")
try:
decrypted_data = gpg.decrypt(encrypted_data, passphrase=passphrase, always_trust=True)
except Exception as e:
print("Error during decryption:", e)
return
print("Decryption finished and decrypted_data is of type: " + str(type(decrypted_data)))
if decrypted_data.ok:
print("Decryption successful!")
print("Decrypted Data:")
print(decrypted_data.data.decode())
else:
print("Decryption failed.")
print("Status:", decrypted_data.status)
print("Error:", decrypted_data.stderr)
print("Trust Level:", decrypted_data.trust_text)
print("Valid:", decrypted_data.valid)
private_key_data = '''-----BEGIN PGP PRIVATE KEY BLOCK-----
<redacted>
-----END PGP PRIVATE KEY BLOCK-----'''
passphrase = '<redacted>'
encrypted_data = b'encryptedData'
decrypt_pgp_data(encrypted_data, private_key_data, passphrase)
As you can see, I am reading PGP-encrypted files from an Azure Blob Storage account container into a Dataframe, and then sending the first row (I'll change this notebook to work on all rows later) through a decrypter function that uses GNUPG.
When this runs it gives me the following output in the driver logs:
+--------------------+--------------------+-------+
| FileName| FileData| ClientLabel |
+--------------------+--------------------+-------+
| fizz.pgp|���mIj�h�#{... | acme|
+--------------------+--------------------+-------+
Decrypting: <redacted>
Loading private key...
WARNING:gnupg:gpg returned a non-zero error code: 2
Private key loaded, attempting decryption...
Decryption finished and decrypted_data is of type: <class 'gnupg.Crypt'>
Decryption failed.
Status: no data was provided
Error: gpg: no valid OpenPGP data found.
[GNUPG:] NODATA 1
[GNUPG:] NODATA 2
[GNUPG:] FAILURE decrypt 4294967295
gpg: decrypt_message failed: Unknown system error
Trust Level: None
Valid: False
Can anyone spot why decryption is failing, or help me troubleshoot it to pin down the culprit? Setting a debugger is not an option since this is happening inside a notebook. I'm thinking:
Can anyone spot where I'm going awry?
Upvotes: 0
Views: 910
Reputation: 605
The problem at hand is that Python does not have any modern modules/libraries that can perform PGP decryption without a dependency on the gpg
native binary installed and accessible from a shell.
I ended up writing a Scala notebook that uses PainlessGPG, although I had to create a custom "fat" (shaded) JAR for all of PainlessPGP's transitive dependencies, and this would not be feasible for any developer who isn't strong with Java.
TL;DR --> Python-based decryption from inside an ADB notebook is not advisable.
Upvotes: 0
Reputation: 311606
The problem is not with the python-gnupg
module.
In the following example code, we first generate a private key, then encrypt some data with it, and then we passed the key and encrypted data to your decrypt_pgp_data
function. Everything seems to work as expected; running the below code results in:
gpg: keybox '/tmp/tmpng8xm_d_/pubring.kbx' created
gpg: /tmp/tmpng8xm_d_/trustdb.gpg: trustdb created
gpg: directory '/tmp/tmpng8xm_d_/openpgp-revocs.d' created
gpg: revocation certificate stored as '/tmp/tmpng8xm_d_/openpgp-revocs.d/8DF4D8326BAD790E37B75C8A66F05BDC77FAF5BE.rev'
gpg: checking the trustdb
gpg: marginals needed: 3 completes needed: 1 trust model: pgp
gpg: depth: 0 valid: 1 signed: 0 trust: 0-, 0q, 0n, 0m, 0f, 1u
Loading private key...
Private key loaded, attempting decryption...
Decryption finished and decrypted_data is of type: <class 'gnupg.Crypt'>
Decryption successful!
Decrypted Data:
This is a test
This suggests that the problem is in how you are generating the encrypted data or the private key, but since you don't show that process in your question it's hard to diagnose.
Here's the code:
import os
import tempfile
import subprocess
import gnupg
def decrypt_pgp_data(encrypted_data, private_key_data, passphrase):
# Initialize GPG object
gpg = gnupg.GPG()
print("Loading private key...")
# Load private key
private_key = gpg.import_keys(private_key_data)
if private_key.count != 1:
raise ValueError("invalid private key")
keyid = private_key.fingerprints[0]
gpg.trust_keys(keyid, "TRUST_ULTIMATE")
print("Private key loaded, attempting decryption...")
try:
decrypted_data = gpg.decrypt(
encrypted_data, passphrase=passphrase, always_trust=True
)
except Exception as e:
print("Error during decryption:", e)
return
print(
"Decryption finished and decrypted_data is of type: "
+ str(type(decrypted_data))
)
if decrypted_data.ok:
print("Decryption successful!")
print("Decrypted Data:")
print(decrypted_data.data.decode())
else:
print("Decryption failed.")
print("Status:", decrypted_data.status)
print("Error:", decrypted_data.stderr)
print("Trust Level:", decrypted_data.trust_text)
print("Valid:", decrypted_data.valid)
passphrase = "secret passphrase"
# Create a temprary directory and use that as GNUPGHOME to avoid mucking
# about with our actual gpg configuration.
with tempfile.TemporaryDirectory() as gnupghome:
os.environ["GNUPGHOME"] = gnupghome
# Generate a new private key non-interactively
genkey = subprocess.Popen(["gpg", "--batch", "--gen-key"], stdin=subprocess.PIPE)
genkey.communicate(
input="\n".join(
[
"Key-Type: 1",
"Key-Length: 2048",
"Subkey-Type: 1",
"Subkey-Length: 2048",
"Name-Real: Example User",
"Name-Email: [email protected]",
"Expire-Date: 0",
f"Passphrase: {passphrase}",
]
).encode()
)
genkey.wait()
# Export the private key.
private_key_data = subprocess.check_output(
[
"gpg",
"--export-secret-key",
"-a",
"--pinentry-mode=loopback",
f"--passphrase={passphrase}",
"[email protected]",
]
)
# Encrypt a sample message with the private key.
encrypt = subprocess.Popen(
["gpg", "-ea", "-r", "[email protected]"],
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
)
encrypted_data, _ = encrypt.communicate(input="This is a test".encode())
encrypt.wait()
# Now we start with a new, empty GNUGPGHOME directory so that we're
# confident that we're successfully importing the private key rather than
# using a key already in our keystore.
with tempfile.TemporaryDirectory() as gnupghome:
os.environ["GNUPGHOME"] = gnupghome
decrypt_pgp_data(encrypted_data, private_key_data, passphrase)
Upvotes: 0