Zach
Zach

Reputation: 1351

How can I extract a tar.gz file in a Google Cloud Storage bucket from a Colab Notebook?

As the question states, I'm trying to figure out how I can extract a .tar.gz file that is stored in a GCS Bucket from a Google Colab notebook.

I am able to connect to my bucket via:

auth.authenticate_user()
project_id = 'my-project'
!gcloud config set project {project_id}

However, when I try running a command such as:

!gsutil tar xvzf my-bucket/compressed-files.tar.gz

I get an error. I know that gsutil probably has limited functionality and maybe isn't meant to do what I'm trying to do, so is there a different way to do it?

Thanks!

Upvotes: 1

Views: 10749

Answers (3)

Racana
Racana

Reputation: 327

You can create a Dataflow process from a template to decompress a file in your Bucket The template is called Bulk decompress Cloud Storage files

You have to specify file location, output location, failure log, and tmp location

Upvotes: 2

This worked for me. I'm new to colab and python itself so I'm not certain this is the solution.

!sudo tar -xvf my-bucket/compressed-files.tar.gz

Upvotes: 0

John Dow
John Dow

Reputation: 806

Google Cloud Storage - GCS does not natively support unpacking a tar archive. You will have to do this yourself either on your local machine or from a Compute Engine VM, for instance

Upvotes: 6

Related Questions