ETDeveloper
ETDeveloper

Reputation: 80

How to automatically encrypt a file in a GCS Bucket using a public key and GPG encryption

I'm working on an automated process that in some specific step requires to detect when a file "land" or is created in a particular GCS bucket. Then encrypt it using a public key, produce a file with ".gpg" extension and store it in another folder or bucket, so the next step in the chain will look for those encrypted files and do soemthing else with them.

Maybe I'm just overcomplicated this but I had thought on use Pub/sub notifications for Cloud Storage and activate a Cloud Build Trigger that will run a gpg command like this:

gpg --import a_public_key.pub.asc
gpg --encrypt-files -r [email protected] gs://some_bucket/somefodler/some_file_here.gz

I feel that it MUST exist a more straightforward way to do this. BTW, I'm trying to avoid (if possible) any alternative where it's necessary to download the objects first, encrypt them, and upload them back. The file size is around 5 GB each.

I really appreciate any help you can provide.

I noticed that you can specify the "encryption type" you would like to have in a specific bucket: https://cloud.google.com/storage/docs/encryption. However, it doesn't look like what I need. I understand that what that does is that it keeps the objects encrypted while they "live" in the bucket. But once are downloaded or transferred GCP decrypt them. (Probably I'm wrong, but that's what I understood about it. Please correct me if I'm wrong)

Upvotes: 1

Views: 2218

Answers (1)

guillaume blaquiere
guillaume blaquiere

Reputation: 76010

there are many parts in your question.

Event trigger

Firstly, the PubSub notification on file created on Cloud Storage is a good solution. A more modern way is to use the new Eventarc service, but at the end, it does the same thing (sink a message in PubSub).

Event pocessing

The use of Cloud Build can be surprising, but it's a convenient way for you to run bash command. For some edge cases, it could be a solution, for 5GB files, you could use Cloud Functions gen2 or Cloud Run. The Cloud Build have limitation in term of parallelism (concurrent build) and your solution is not very scalable

  • Cloud Functions gen2 allow you to have up to 32Gb of memory, enough to download 1 file and keep its encrypted equivalent in memory. You can use Python or or language libraries to perform the same operation by code (and not by bash CLI). In my company, we use Gnupg
  • Cloud Run is very similar to Cloud Functions gen2 (it's the same underlying infrastructure), but you have the full control on the container, and therefore on the runtime environment. You can create a container with GPG installed on it, and invoke bash operation in Python or any other language to use that installed system binary

In both cases, set the concurrency to 1 to be sure to process only 1 file per instance and prevent out of memory error

File download

Do you have to download the file to encrypt it? What's the concern here? My first answer is: you will have to read all the byte of your file to encrypt it. Therefore, you will download it fully, at least only for reading, even in streaming mode. I personally prefer to download the file and then work on it, instead of stream read it. Especially if your encryption checksum is wrong, the retry will be easier, because the file is already downloaded.

But, of course, you have to think to delete the file after the end of the process, except if you keep the content only in a variable in memory, like that, it will be swapped automatically.

Encryption and security

Your question about encryption is great. And in fact all depends on what you want to achieve.

Google Cloud ensure you that your data are ALWAYS encrypted, either in transit or at rest.

When you upload or download a file (or something else like accessing to an API), you are always in HTTPS, so your data are encrypted

On Cloud Storage, the data are encrypted with internal Google Cloud keys. You can choose the location and the rotation frequency of the keys with the CMEK option (Customer Managed Encryption Key). But the keys are still own/hosted by Google Cloud.

You can also partner with third party company that offer Security key not owned by Google Cloud and configure your bucket with CSEK (Customer Supplied Encryption Key). This time, Google Cloud doesn't keep the keys and can't decrypt your data if it has no access to the third party company keys.

So yes, if you have the permission, the data are decrypted by Google and sent them HTTPS encrypted.

The use of GPG

The use of GPG is for 2 specific use cases:

  • You don't trust Google and you want your own encryption layer, managed by you
  • You want to send the data to third party that have the decryption key to read the data. Like that even if many third party companies have access to the same bucket and can download any files, only those with the correct key will be able to decrypt its dedicated files.

A better solution to chain all the steps, is to use Cloud Workflows triggered by PubSub (or Eventarc). Like that, you can create a pipeline per file uploaded on Cloud Storage and then

  • Invoke a Cloud Run/Cloud Functions gen2 to encrypt the files
  • Do something after
  • Do something else
  • ....

Upvotes: 6

Related Questions