HKM
HKM

Reputation: 13

How do I merge and zip large files on google cloud?

I want to merge a set of csv files and zip them in GCP.

I will be getting a folder containing a lot of csv files in GCP bucket (40 GB of data). Once the entire data is received, I need to merge all the csv files together into 1 file and zip it. Then store it to another location. I only need to do this once a month.

What is the best way in which I can achieve this?

I was planning to use the below strategy, but dont know if its a good solution

  1. a Pub/Sub to listen to the bucket folder and invoke a cloud function from there.
  2. Cloud function will call a cloud composer containing a Dag to do the activity

Upvotes: 0

Views: 664

Answers (1)

Ernesto U
Ernesto U

Reputation: 806

It might be a lot easier to send the CSV files to a directory inside an GCP instance once there you can use a cron job to zip the files and finally copy it into your bucket with gsutil

If sending the files to the instance is not feasible you can download them with gsutil, zip them and upload the zip file again.

Either way, you will have to give the instance service account the proper IAM roles to modify the content of the bucket or give it ACL level access finally don't forget to give it the proper scopes to your instance

Upvotes: 1

Related Questions