Dave Butler
Dave Butler

Reputation: 1833

Pipe commands not getting killed after head exits

So, I'm not sure if the commands are actually important, but for background this is the command I was running:

aws s3 ls s3://REDACTED/ | jq -nR '[inputs | split(" +"; null).[3]] | reverse.[] | "bkt --ttl '1h' -- aws s3 cp s3://REDACTED/\(.) - | tac"' -r | bash | head

Basically, I wanted to get the last 10 lines from all the lines of all the objects (by name in lexicographic order). To do this, I list the bucket, then reverse it, then construct some commands to cat them in reverse order, then run them, then limit the output with head. For those who don't know, bkt is a wrapper that adds caching to any command...

When I ran that I was a bit surprised that it didn't quit after the first 10 lines, in-fact it seems like it is still running all the aws cp commands...

My question here is why didn't the bash command in the pipe abort? it should have gotten a SIGPIPE after head exits right?

EDIT: This is a less complicated command that does the same thing:

seq 100000000000 | sed -E $'s/(.*)/sh -c \'echo \\1; sleep .1\'/' | sh | head

EDIT: Note: As @Fravadona brings up, please use caution when piping commands into a shell interpreter. This is not an example of a robust solution to any specific problem

Upvotes: 1

Views: 74

Answers (3)

dash-o
dash-o

Reputation: 14493

Expanding my comment into answer, given potential more details:

OP code is

aws s3 ls s3://REDACTED/ |
jq -nR '[inputs | split(" +"; null).[3]] | reverse.[] | "bkt --ttl '1h' -- aws s3 cp s3://REDACTED/\(.) - | tac"' -r 
bash -e |
head

While will load all data for all objects, and process them in memory. Alternative solution will be to reorder the commands tac | head to use tail | tac. This will reduce the memory requirements on tac - will only has to store/reverse the last 10 lines of the file

aws s3 ls s3://REDACTED/ |
jq -nR '[inputs | split(" +"; null).[3]] | reverse.[] | "bkt --ttl '1h' -- aws s3 cp s3://REDACTED/\(.) - | tail -10"' -r |
bash |
head

**Notes: I do NOT have access to S3 to check the below. This is based on AWS documentation. Might have typo errors **

The solution will still load full objects from S3. To improve performance, and reduce amount of download data (if the S3 objects are big) - using get-object should be considered. It provides an option to download fixed amount of data from the end of the object. Assuming possible to estimate maximum size of the 10 lines (let's assume 2k), possible to write something like

aws s3 ls s3://REDACTED/ |
jq -nR '[inputs | split(" +"; null).[3]] | reverse.[] | "bkt --ttl '1h' -- aws s3 get-object --bucket: REDACTED --key \(.) --range=bytes=2000 | tail -10"' -r |
bash |
head

Upvotes: 0

Philippe
Philippe

Reputation: 26727

aws command cannot get the SIGPIPE as it's not writing to closed pipe.

When you run :

seq 100000000000 | sed -E $'s/(.*)/sh -c \'echo \\1; sleep .1\'/' | sh | head

The process which writes to the pipe is this one: sh -c 'echo N; sleep .1'. So the final sh didn't get SIGPIPE, which is the reason why it keeps running.

You can notice the difference when you run this :

seq 100000000000 | sed -E $'s/(.*)/echo \\1/' | sh | head

Upvotes: 1

Dave Butler
Dave Butler

Reputation: 1833

I think I figured out the answer while writing up the question...

What I think is happening is that the aws/bkt command actually gets the SIGPIPE, and bash never sees it because it is just gluing up the pipe... The easiest fix with my case was to change bash to bash -e so that it quit after the subprocess failed...

Upvotes: 0

Related Questions