Richard
Richard

Reputation: 65530

gsutil: Argument list too long

I am trying to upload many thousands of files to Google Cloud Storage, with the following command:

gsutil -m cp *.json gs://mybucket/mydir

But I get this error:

-bash: Argument list too long

What is the best way to handle this? I can obviously write a bash script to iterate over different numbers:

gsutil -m cp 92*.json gs://mybucket/mydir
gsutil -m cp 93*.json gs://mybucket/mydir
gsutil -m cp ...*.json gs://mybucket/mydir

But the problem is that I don't know in advance what my filenames are going to be, so writing that command isn't trivial.

Is there either a way to handle this with gsutil natively (I don't think so, from the documentation), or a way to handle this in bash where I can list say 10,000 files at a time, then pipe them to the gsutil command?

Upvotes: 7

Views: 5101

Answers (3)

Mike Schwartz
Mike Schwartz

Reputation: 12145

Eric's answer should work, but another option would be to rely on gsutil's built-in wildcarding, by quoting the wildcard expression:

gsutil -m cp "*.json" gs://mybucket/mydir

To explain more: The "Argument list too long" error is coming from the shell, which has a limited size buffer for expanded wildcards. By quoting the wildcard you prevent the shell from expanding the wildcard and instead the shell passes that literal string to gsutil. gsutil then expands the wildcard in a streaming fashion, i.e., expanding it while performing the operations, so it never needs to buffer an unbounded amount of expanded text. As a result you can use gsutil wildcards over arbitrarily large expressions. The same is true when using gsutil wildcards over object names, so for example this would work:

gsutil -m cp "gs://my-bucket1/*" gs://my-bucket2

even if there are a billion objects at the top-level of gs://my-bucket1.

Upvotes: 25

Tom Fenech
Tom Fenech

Reputation: 74605

Here's a way you could do it, using xargs to limit the number of files that are passed to gsutil at once. Null bytes are used to prevent problems with spaces in or newlines in the filenames.

printf '%s\0' *.json | xargs -0 sh -c 'copy_all () { 
    gsutil -m cp "$@" gs://mybucket/mydir
}
copy_all "$@"'

Here we define a function which is used to put the file arguments in the right place in the gsutil command. This whole process should happen the minimum number of times required to process all arguments, passing the maximum number of filename arguments possible each time.

Alternatively you can define the function separately and then export it (this is bash-specific):

copy_all () { 
    gsutil -m cp "$@" gs://mybucket/mydir
}
printf '%s\0' *.json | xargs -0 bash -c 'export -f copy_all; copy_all "$@"'

Upvotes: 1

Eric Renouf
Eric Renouf

Reputation: 14500

If your filenames are safe from newlines you could use gsutil cp's ability to read from stdin like

find . -maxdepth 1 -type f -name '*.json' | gsutil -m cp -I gs://mybucket/mydir

or if you're not sure if your names are safe and your find and xargs support it you could do

find . -maxdepth 1 -type f -name '*.json' -print0 | xargs -0 -I {} gsutil -m cp {} gs://mybucket/mydir

Upvotes: 3

Related Questions