Reputation: 1833
So, I'm not sure if the commands are actually important, but for background this is the command I was running:
aws s3 ls s3://REDACTED/ | jq -nR '[inputs | split(" +"; null).[3]] | reverse.[] | "bkt --ttl '1h' -- aws s3 cp s3://REDACTED/\(.) - | tac"' -r | bash | head
Basically, I wanted to get the last 10 lines from all the lines of all the objects (by name in lexicographic order). To do this, I list the bucket, then reverse it, then construct some commands to cat them in reverse order, then run them, then limit the output with head. For those who don't know, bkt
is a wrapper that adds caching to any command...
When I ran that I was a bit surprised that it didn't quit after the first 10 lines, in-fact it seems like it is still running all the aws cp commands...
My question here is why didn't the bash
command in the pipe abort? it should have gotten a SIGPIPE
after head
exits right?
EDIT: This is a less complicated command that does the same thing:
seq 100000000000 | sed -E $'s/(.*)/sh -c \'echo \\1; sleep .1\'/' | sh | head
EDIT: Note: As @Fravadona brings up, please use caution when piping commands into a shell interpreter. This is not an example of a robust solution to any specific problem
Upvotes: 1
Views: 74
Reputation: 14493
Expanding my comment into answer, given potential more details:
OP code is
aws s3 ls s3://REDACTED/ |
jq -nR '[inputs | split(" +"; null).[3]] | reverse.[] | "bkt --ttl '1h' -- aws s3 cp s3://REDACTED/\(.) - | tac"' -r
bash -e |
head
While will load all data for all objects, and process them in memory.
Alternative solution will be to reorder the commands tac | head
to use tail | tac
. This will reduce the memory requirements on tac - will only has to store/reverse the last 10 lines of the file
aws s3 ls s3://REDACTED/ |
jq -nR '[inputs | split(" +"; null).[3]] | reverse.[] | "bkt --ttl '1h' -- aws s3 cp s3://REDACTED/\(.) - | tail -10"' -r |
bash |
head
**Notes: I do NOT have access to S3 to check the below. This is based on AWS documentation. Might have typo errors **
The solution will still load full objects from S3. To improve performance, and reduce amount of download data (if the S3 objects are big) - using get-object
should be considered. It provides an option to download fixed amount of data from the end of the object. Assuming possible to estimate maximum size of the 10 lines (let's assume 2k), possible to write something like
aws s3 ls s3://REDACTED/ |
jq -nR '[inputs | split(" +"; null).[3]] | reverse.[] | "bkt --ttl '1h' -- aws s3 get-object --bucket: REDACTED --key \(.) --range=bytes=2000 | tail -10"' -r |
bash |
head
Upvotes: 0
Reputation: 26727
aws
command cannot get the SIGPIPE as it's not writing to closed pipe.
When you run :
seq 100000000000 | sed -E $'s/(.*)/sh -c \'echo \\1; sleep .1\'/' | sh | head
The process which writes to the pipe is this one: sh -c 'echo N; sleep .1'
. So the final sh
didn't get SIGPIPE, which is the reason why it keeps running.
You can notice the difference when you run this :
seq 100000000000 | sed -E $'s/(.*)/echo \\1/' | sh | head
Upvotes: 1
Reputation: 1833
I think I figured out the answer while writing up the question...
What I think is happening is that the aws
/bkt
command actually gets the SIGPIPE
, and bash
never sees it because it is just gluing up the pipe... The easiest fix with my case was to change bash
to bash -e
so that it quit after the subprocess failed...
Upvotes: 0