Reputation: 2475
I would like to download publicly available data from google cloud storage. However, because I need to be in a Python3.x environment, it is not possible to use gsutil. I can download individual files with wget as
wget http://storage.googleapis.com/path-to-file/output_filename -O output_filename
However, commands like
wget -r --no-parent https://console.cloud.google.com/path_to_directory/output_directoryname -O output_directoryname
do not seem to work as they just download an index file for the directory. Neither do rsync or curl attempts based on some initial attempts. Any idea of how to download publicly available data on google cloud storage as a directory?
Upvotes: 4
Views: 2290
Reputation: 2593
The approach you mentioned above does not work because Google Cloud Storage doesn't have real "directories". As an example, "path/to/some/files/file.txt" is the entire name of that object. A similarly named object, "path/to/some/files/file2.txt", just happens to share the same naming prefix.
As for how you could fetch these files: The GCS APIs (both XML and JSON) allow you to do an object listing against the parent bucket, specifying a prefix; in this case, you'd want all objects starting with the prefix "path/to/some/files/". You could then make individual HTTP requests for each of the objects specified in the response body. That being said, you'd probably find this much easier to do via one of the GCS client libraries, such as the Python library.
Also, gsutil currently has a GitHub issue open to track adding support for Python 3.
Upvotes: 2