Reputation: 65530
I am trying to upload many thousands of files to Google Cloud Storage, with the following command:
gsutil -m cp *.json gs://mybucket/mydir
But I get this error:
-bash: Argument list too long
What is the best way to handle this? I can obviously write a bash script to iterate over different numbers:
gsutil -m cp 92*.json gs://mybucket/mydir
gsutil -m cp 93*.json gs://mybucket/mydir
gsutil -m cp ...*.json gs://mybucket/mydir
But the problem is that I don't know in advance what my filenames are going to be, so writing that command isn't trivial.
Is there either a way to handle this with gsutil
natively (I don't think so, from the documentation), or a way to handle this in bash where I can list say 10,000 files at a time, then pipe them to the gsutil
command?
Upvotes: 7
Views: 5101
Reputation: 12145
Eric's answer should work, but another option would be to rely on gsutil's built-in wildcarding, by quoting the wildcard expression:
gsutil -m cp "*.json" gs://mybucket/mydir
To explain more: The "Argument list too long" error is coming from the shell, which has a limited size buffer for expanded wildcards. By quoting the wildcard you prevent the shell from expanding the wildcard and instead the shell passes that literal string to gsutil. gsutil then expands the wildcard in a streaming fashion, i.e., expanding it while performing the operations, so it never needs to buffer an unbounded amount of expanded text. As a result you can use gsutil wildcards over arbitrarily large expressions. The same is true when using gsutil wildcards over object names, so for example this would work:
gsutil -m cp "gs://my-bucket1/*" gs://my-bucket2
even if there are a billion objects at the top-level of gs://my-bucket1.
Upvotes: 25
Reputation: 74605
Here's a way you could do it, using xargs
to limit the number of files that are passed to gsutil
at once. Null bytes are used to prevent problems with spaces in or newlines in the filenames.
printf '%s\0' *.json | xargs -0 sh -c 'copy_all () {
gsutil -m cp "$@" gs://mybucket/mydir
}
copy_all "$@"'
Here we define a function which is used to put the file arguments in the right place in the gsutil
command. This whole process should happen the minimum number of times required to process all arguments, passing the maximum number of filename arguments possible each time.
Alternatively you can define the function separately and then export
it (this is bash-specific):
copy_all () {
gsutil -m cp "$@" gs://mybucket/mydir
}
printf '%s\0' *.json | xargs -0 bash -c 'export -f copy_all; copy_all "$@"'
Upvotes: 1
Reputation: 14500
If your filenames are safe from newlines you could use gsutil cp
's ability to read from stdin
like
find . -maxdepth 1 -type f -name '*.json' | gsutil -m cp -I gs://mybucket/mydir
or if you're not sure if your names are safe and your find
and xargs
support it you could do
find . -maxdepth 1 -type f -name '*.json' -print0 | xargs -0 -I {} gsutil -m cp {} gs://mybucket/mydir
Upvotes: 3