Reputation: 31
Been working on this all day, kind of got it to run, but I may still need some help to polish my code language.
Situation: I am using bedtools that gets two files (tab delimited) that contain genomic intervals (one per line) with some additional data (by column). More precisely, I am running the window function, this generates and output that contains for each interval in "a" file, all the intervals in "b" file that fall into the window that I have defined with parameter -l
and -r
. More precise explanation can be found here.
An example of function as taken from their web:
$ cat A.bed
chr1 1000 2000
$ cat B.bed
chr1 500 800
chr1 10000 20000
$ bedtools window -a A.bed -b B.bed -l 200 -r 20000
chr1 1000 2000 chr1 10000 20000
$ bedtools window -a A.bed -b B.bed -l 300 -r 20000
chr1 1000 2000 chr1 500 800
chr1 1000 2000 chr1 10000 20000
Question: So the thing is that I want to use that stdout to do a number of things in one shot.
wc -l
cut -f 4-6
sort | uniq -u
tee file.bed
wc -l
So I have manged to get it to work more or less with this:
windowBed -a ARS_saccer3.bed -b ./Peaks/WTappeaks_-Mit_sorted.bed -r 0 -l 10000 | tee >(wc -l) >(cut -f 7-13 | sort | uniq -u | tee ./Window/windowBed_UP10.bed | wc -l)
This kind of works, because I get the output file correctly, and the values show in screen but... like this
juan@juan-VirtualBox:~/Desktop/sf_Biolinux_sf/IGV/Collisions$ 448
543
The first number is the second wc -l
I don't understand why it shows first. And also, after the second number, cursor remains awaiting for instructions instead of appearing a new command line, so I assume there is something that remain unfinished with the code line as it is right now.
This probably is something very basic, but I will be very grateful to anyone that cares to explain me a little more about programming.
For anyone willing to offer solutions, bear in mind that I would like to keep this pipe in one line, without the need to run additional sh or anything else.
Thanks
Upvotes: 0
Views: 164
Reputation: 33083
When you create a "forked pipeline" like this, bash has to run the two halves of the fork concurrently, otherwise where would it buffer the stdout for the other half of the fork? So it is essentially like running both subshells in the background, which explains why you get the results in an order you did not expect (due to the concurrency) and why the output is dumped unceremoniously on top of your command prompt.
You can avoid both of these problems by writing the two outputs to separate temporary files, waiting for everything to finish, and then concatenating the temporary files in the order you expect, like this:
windowBed -a ARS_saccer3.bed -b ./Peaks/WTappeaks_-Mit_sorted.bed -r 0 -l 10000 | tee >(wc -l >tmp1) >(cut -f 7-13 | sort | uniq -u | tee ./Window/windowBed_UP10.bed | wc -l >tmp2)
wait
cat tmp1 tmp2
rm tmp1 tmp2
Upvotes: 1