Reputation: 21

Concatenating files in Amazon S3 buckets

I have 5 different processes running on different virtual machines (VMs) on EC2 creating 5 different files (f1.txt, f2.txt, f3.txt, f4.txt, f5.txt). These VMs are started at roughly the same time but will finish at different times.

I need to

~ wait for these 5 files to be written out

~ merge them and create a new file e.g. f.txt = f1.txt + f2.txt + f3.txt + f4.txt + f5.txt

~ Questions: # How can I determine when all 5 files are ready and written out? # Can I use s3cat or some similar command line tool to do that? Does s3cat have similar semantics to Unix cat e.g. cat s3://mybucket/f1.txt > s3://mybucket/f.txt cat s3://mybucket/f2.txt >> s3://mybucket/f.txt cat s3://mybucket/f3.txt >> s3://mybucket/f.txt cat s3://mybucket/f4.txt >> s3://mybucket/f.txt cat s3://mybucket/f5.txt >> s3://mybucket/f.txt

Their examples on GitHub didn’t show this use case.

The output file generated (f.txt) is for use by a downstream process.

Upvotes: 2

Answers (2)

Erik Aronesty

Reputation: 12945

I think you want to use multipart uploads, instead of uploading a bunch of files and catting them

Upvotes: 0

Paul M

Reputation: 2046

If you know the names of the keys you are using for the 5 files you are uploading, you can just poll for them. If you know python, boto is a great module for interfacing with s3 and would make handling the above a cinch. Also, s3 does guarantee that a file won't appear to other clients until it has been completely uploaded so you don't have to worry about reading partial files.

Boto is also a good way to concatenate the output if you are already using it check for the files.

Upvotes: 1

Concatenating files in Amazon S3 buckets

I need to

Answers (2)

Related Questions