Juan LB
Juan LB

Reputation: 31

Piping stdout to two different commands

Been working on this all day, kind of got it to run, but I may still need some help to polish my code language.

Situation: I am using bedtools that gets two files (tab delimited) that contain genomic intervals (one per line) with some additional data (by column). More precisely, I am running the window function, this generates and output that contains for each interval in "a" file, all the intervals in "b" file that fall into the window that I have defined with parameter -l and -r. More precise explanation can be found here.

An example of function as taken from their web:

$ cat A.bed
chr1  1000  2000

$ cat B.bed
chr1  500   800
chr1  10000 20000

$ bedtools window -a A.bed -b B.bed -l 200 -r 20000
chr1  1000   2000  chr1  10000  20000

$ bedtools window -a A.bed -b B.bed -l 300 -r 20000
chr1  1000   2000  chr1  500    800
chr1  1000   2000  chr1  10000  20000

Question: So the thing is that I want to use that stdout to do a number of things in one shot.

  1. Count the number of lines in the original stdout. For that I use wc -l
  2. Then:
    • cut columns 4-6 cut -f 4-6
    • sort lines and keep only those not repeated sort | uniq -u
    • save to a file tee file.bed
    • count number of lines of the new stdout, again wc -l

So I have manged to get it to work more or less with this:

windowBed -a ARS_saccer3.bed -b ./Peaks/WTappeaks_-Mit_sorted.bed -r 0 -l 10000 | tee >(wc -l) >(cut -f 7-13 | sort | uniq -u | tee ./Window/windowBed_UP10.bed | wc -l)

This kind of works, because I get the output file correctly, and the values show in screen but... like this

juan@juan-VirtualBox:~/Desktop/sf_Biolinux_sf/IGV/Collisions$ 448
543

The first number is the second wc -l I don't understand why it shows first. And also, after the second number, cursor remains awaiting for instructions instead of appearing a new command line, so I assume there is something that remain unfinished with the code line as it is right now. This probably is something very basic, but I will be very grateful to anyone that cares to explain me a little more about programming. For anyone willing to offer solutions, bear in mind that I would like to keep this pipe in one line, without the need to run additional sh or anything else.

Thanks

Upvotes: 0

Views: 164

Answers (1)

Robin Green
Robin Green

Reputation: 33083

When you create a "forked pipeline" like this, bash has to run the two halves of the fork concurrently, otherwise where would it buffer the stdout for the other half of the fork? So it is essentially like running both subshells in the background, which explains why you get the results in an order you did not expect (due to the concurrency) and why the output is dumped unceremoniously on top of your command prompt.

You can avoid both of these problems by writing the two outputs to separate temporary files, waiting for everything to finish, and then concatenating the temporary files in the order you expect, like this:

windowBed -a ARS_saccer3.bed -b ./Peaks/WTappeaks_-Mit_sorted.bed -r 0 -l 10000 | tee >(wc -l >tmp1) >(cut -f 7-13 | sort | uniq -u | tee ./Window/windowBed_UP10.bed | wc -l >tmp2)
wait
cat tmp1 tmp2
rm tmp1 tmp2

Upvotes: 1

Related Questions