Reputation: 1448
I use to append datasets in a bucket in gcloud using:
gsutil compose gs://bucket/obj1 [gs://bucket/obj2 ...] gs://bucket/composite
However, today when I tried to append some data the terminal prints the error CommandException: The compose command accepts at most 33 arguments.
I didn't know about this restriction. How can I append more than 33 files in my bucket? Is there another command line tool? I would like to avoid to create a virtual machine for what looks like a rather simple task.
I checked the help using gsutil help compose
. But it didn't help much. There is only a warning saying "Note that there is a limit (currently 32) to the number of components that can
be composed in a single operation." but no hint on a workaround.
Upvotes: 0
Views: 1310
Reputation: 40071
Could you not do it recursively|batch?
I've not tried this.
Given an arbitrary list of files (FILES
)
While there is more than 1 file in FILES
:
FILES
and gsutil compose
into temp fileFILES
with the 1 temp file.The file that remains is everything composed.
The question piqued my curiosity and gave me an opportunity to improve my bash ;-)
A rough-and-ready proof-of-concept bash script that generates batches of gsutil compose
commands for arbitrary (limited by the string formatting %04
) numbers of files.
GSUTIL="gsutil compose"
BATCH_SIZE="32"
# These may be the same (or no) bucket
SRC="gs://bucket01/"
DST="gs://bucket02/"
# Generate test LST
FILES=()
for N in $(seq -f "%04g" 1 100); do
FILES+=("${SRC}/file-${N}")
done
function squish() {
LST=("$@")
LEN=${#LST[@]}
if [ "${LEN}" -le "1" ]; then
# Empty array; nothing to do
return 1
fi
# Only unique for this configuration; be careful
COMPOSITE=$(printf "${DST}/composite-%04d" ${LEN})
if [ "${LEN}" -le "${BATCH_SIZE}" ]; then
# Batch can be composed with one command
echo "${GSUTIL} ${LST[@]} ${COMPOSITE}"
return 1
fi
# Compose 1st batch of files
# NB Provide start:size
echo "${GSUTIL} ${LST[@]:0:${BATCH_SIZE}} ${COMPOSITE}"
# Remove batch from LST
# NB Provide start (to end is implied)
REM=${LST[@]:${BATCH_SIZE}}
# Prepend composite from above batch to the next run
NXT=(${COMPOSITE} ${REM[@]})
squish "${NXT[@]}"
}
squish "${FILES[@]}"
Running with BATCH_SIZE=3
, no buckets and 12 files yields:
gsutil compose file-0001 file-0002 file-0003 composite-0012
gsutil compose composite-0012 file-0004 file-0005 composite-0010
gsutil compose composite-0010 file-0006 file-0007 composite-0008
gsutil compose composite-0008 file-0008 file-0009 composite-0006
gsutil compose composite-0006 file-0010 file-0011 composite-0004
gsutil compose composite-0004 file-0012 composite-0002
NOTE How
composite-0012
is created by the first command but then knitted into the subsequent command.
I'll leave it to you to improve throughput by not threading the output from each step into the next, parallelizing the gsutil compose
commands across the list chopped into batches and then compose the batches.
Upvotes: 2
Reputation: 2593
The docs say that you may only combine 32 components in a single operation, but there is no limit to the number of components that can make up a composite object.
So, if you have more than 32 objects to concatenate, you may perform multiple compose operations, composing 32 objects at a time until you eventually get all of them composed together.
Upvotes: 1