toom
toom

Reputation: 13347

Moving multiple files with gsutil

Let's say I've got the following files in a Google Cloud Storage bucket:

file_A1.csv
file_B2.csv
file_C3.csv

Now I want to move a subset of these files, lets say file_A1.csv and file_B2.csv. Currently I do this like that:

gsutil mv gs://bucket/file_A1.csv gs://bucket/file_A11.csv
gsutil mv gs://bucket/file_B2.csv gs://bucket/file_B22.csv

This approach requires two call of more or less the same command and moves each file separately. I know, that if I move a complete directory I can add the -m option in order to accelerate this process. However, unfortunately I just want to move a subset of all files and keep the rest untouched in the bucket.

When moving 100 files this way I need to execute 100 commands or so and this becomes quite time consuming. I there a way to combine each of the 100 files into just one command with addtionally the -m option?

Upvotes: 18

Views: 23392

Answers (7)

Matheus Torquato
Matheus Torquato

Reputation: 1639

That worked for me for moving all txt files from gs://config to gs://config/new_folder

gsutil -m mv 'gs://config/*.txt' gs://config/new_folder/

I had some problems using the wildcard * in zsh, so that is the reason for the quotes around the origin path

Upvotes: 10

John Kitonyo
John Kitonyo

Reputation: 2419

Not documented widely but this works all the time

To move the contents of the third folder to the root or any folder before it

gsutil ls gs://my-bucket/first/second/third/ | gsutil -m mv -I gs://my-bucket/first/

and to copy

gsutil ls gs://my-bucket/first/second/third/ | gsutil -m cp -I gs://my-bucket/first/

Upvotes: 3

Hisham Karam
Hisham Karam

Reputation: 1318

you can achieve that using bash by iterating over the gsutil ls output for example:

  • source folder name: old_folder
  • new folder name: new_folder
for x in `gsutil ls "gs://<bucket_name>/old_folder"`; do y=$(basename -- "$x");gsutil mv ${x} gs://<bucket_name>/new_folder/${y}; done

you can run in parallel if you have a huge number of files using:

N=8 # number of parallel workers
(
for x in `gsutil ls "gs://<bucket_name>/old_folder"`; do 
   ((i=i%N)); ((i++==0)) && wait
   y=$(basename -- "$x");gsutil mv ${x} gs://<bucket_name>/new_folder/${y} & 
done
)

Upvotes: 4

Francis Chasco
Francis Chasco

Reputation: 11

To do this you can run the follow gsutil command:

gsutil mv gs://bucket_name/common_file_name*  gs://bucket_destiny_name/common_file_name*    

In your case; common_file_name is "file_"

Upvotes: 1

brook
brook

Reputation: 247

The lack of -m flag is the real hang up here. Facing the same issue I originally managed this by using python multiprocessing and os.system to call gsutil. I had 60k files and it was going to take hours. With some experimenting I found using the python client gave a 20x speed-up!

If you are willing to move away from gsutil - its a better approach.

Here is a copy(or move) method. If you create a list of src keys/uri's you can call this using multi-threading for fast results.

Note: the method a tuple of (destination-name,exception) which you can pop into a dataframe or something to look for failures

def cp_blob(key=None,bucket=BUCKET_NAME,uri=None,delete_src=False):
    try:
        if uri:
            uri=re.sub('gs://','',uri)
            bucket,key=uri.split('/',maxsplit=1)
        client=storage.Client()
        bucket=client.get_bucket(bucket)
        blob=bucket.blob(key)
        dest=re.sub(THING1,THING2,blob.name)  ## OR SOME OTHER WAY TO GET NEW DESTINATIONS
        out=bucket.copy_blob(blob,bucket,dest)
        if delete_src:
            blob.delete()
        return out.name, None
    except Exception as e:
        return None, str(e)

Upvotes: 0

evolved
evolved

Reputation: 2210

If you have a list of the files you want to move you can use the -I option from the cp command which, according to the docs, is also valid for the mv command:

cat filelist | gsutil -m mv -I gs://my-bucket

Upvotes: 9

rein
rein

Reputation: 33465

gsutil does not support this currently but what you could do is create a number of shell scripts, each performing a portion of the moves, and run them concurrently.

Note that gsutil mv is based on the syntax of the unix mv command, which also doesn't support the feature you're asking for.

Upvotes: 4

Related Questions