Reputation: 80
I'm working on an automated process that in some specific step requires to detect when a file "land" or is created in a particular GCS bucket. Then encrypt it using a public key, produce a file with ".gpg" extension and store it in another folder or bucket, so the next step in the chain will look for those encrypted files and do soemthing else with them.
Maybe I'm just overcomplicated this but I had thought on use Pub/sub notifications for Cloud Storage and activate a Cloud Build Trigger that will run a gpg command like this:
gpg --import a_public_key.pub.asc
gpg --encrypt-files -r [email protected] gs://some_bucket/somefodler/some_file_here.gz
I feel that it MUST exist a more straightforward way to do this. BTW, I'm trying to avoid (if possible) any alternative where it's necessary to download the objects first, encrypt them, and upload them back. The file size is around 5 GB each.
I really appreciate any help you can provide.
I noticed that you can specify the "encryption type" you would like to have in a specific bucket: https://cloud.google.com/storage/docs/encryption. However, it doesn't look like what I need. I understand that what that does is that it keeps the objects encrypted while they "live" in the bucket. But once are downloaded or transferred GCP decrypt them. (Probably I'm wrong, but that's what I understood about it. Please correct me if I'm wrong)
Upvotes: 1
Views: 2218
Reputation: 76010
there are many parts in your question.
Event trigger
Firstly, the PubSub notification on file created on Cloud Storage is a good solution. A more modern way is to use the new Eventarc service, but at the end, it does the same thing (sink a message in PubSub).
Event pocessing
The use of Cloud Build can be surprising, but it's a convenient way for you to run bash command. For some edge cases, it could be a solution, for 5GB files, you could use Cloud Functions gen2 or Cloud Run. The Cloud Build have limitation in term of parallelism (concurrent build) and your solution is not very scalable
In both cases, set the concurrency to 1 to be sure to process only 1 file per instance and prevent out of memory error
File download
Do you have to download the file to encrypt it? What's the concern here? My first answer is: you will have to read all the byte of your file to encrypt it. Therefore, you will download it fully, at least only for reading, even in streaming mode. I personally prefer to download the file and then work on it, instead of stream read it. Especially if your encryption checksum is wrong, the retry will be easier, because the file is already downloaded.
But, of course, you have to think to delete the file after the end of the process, except if you keep the content only in a variable in memory, like that, it will be swapped automatically.
Encryption and security
Your question about encryption is great. And in fact all depends on what you want to achieve.
Google Cloud ensure you that your data are ALWAYS encrypted, either in transit or at rest.
When you upload or download a file (or something else like accessing to an API), you are always in HTTPS, so your data are encrypted
On Cloud Storage, the data are encrypted with internal Google Cloud keys. You can choose the location and the rotation frequency of the keys with the CMEK option (Customer Managed Encryption Key). But the keys are still own/hosted by Google Cloud.
You can also partner with third party company that offer Security key not owned by Google Cloud and configure your bucket with CSEK (Customer Supplied Encryption Key). This time, Google Cloud doesn't keep the keys and can't decrypt your data if it has no access to the third party company keys.
So yes, if you have the permission, the data are decrypted by Google and sent them HTTPS encrypted.
The use of GPG
The use of GPG is for 2 specific use cases:
A better solution to chain all the steps, is to use Cloud Workflows triggered by PubSub (or Eventarc). Like that, you can create a pipeline per file uploaded on Cloud Storage and then
Upvotes: 6