Reputation: 784
I have some lines that I get in order using following
grep ENSG00000006114 File | sort -V
chr17 35874900 35879174 ABCD0000006114:I25 -
chr17 35874901 35879174 ABCD0000006114:I25 -
chr17 35875548 35875671 ABCD0000006114:E27 -
chr17 35875672 35877289 ABCD0000006114:I26 -
chr17 35877290 35877445 ABCD0000006114:E26 -
chr17 35877446 35877932 ABCD0000006114:I25 -
However I want to delete the first rows that contain ':I' in the first row until I get ':E' for that I have been trying something like
grep ENSG00000006114 File | sort -V | awk '{if ($4 ~ /:I/ && NR==1) next};1'
However there might be more than one occurrences as in the above case for the first few rows, so how do I exclude rows containing :I until the first :E occurs in first row such that my final outcome would be:
chr17 35875548 35875671 ABCD0000006114:E27 -
chr17 35875672 35877289 ABCD0000006114:I26 -
chr17 35877290 35877445 ABCD0000006114:E26 -
chr17 35877446 35877932 ABCD0000006114:I25 -
Upvotes: 2
Views: 50
Reputation: 37424
You could also just redirect it right back to grep
:
$ grep ENSG00000006114 File | sort -V | grep -A 10000000000000000 :E
chr17 35875548 35875671 ABCD0000006114:E27 -
chr17 35875672 35877289 ABCD0000006114:I26 -
chr17 35877290 35877445 ABCD0000006114:E26 -
chr17 35877446 35877932 ABCD0000006114:I25 -
Upvotes: 1
Reputation: 204015
Assuming the grep+sort are useful in that order due to your input file being enormous, all you need from awk is:
grep ENSG00000006114 File | sort -V | awk '$4~/:E/{f=1} f'
and if the file isn't huge you can lose the grep:
sort -V File | awk '!/ENSG00000006114/{next} $4~/:E/{f=1} f'
Upvotes: 4
Reputation: 785581
You can use this awk:
grep ENSG00000006114 File | sort -V |
awk 'p==1 && $4 ~ /:E/{p=2} !p && $4 ~ /:I/{p=1} p==1{next} 1'
chr17 35875548 35875671 ABCD0000006114:E27 -
chr17 35875672 35877289 ABCD0000006114:I26 -
chr17 35877290 35877445 ABCD0000006114:E26 -
chr17 35877446 35877932 ABCD0000006114:I25 -
p==0
&& $4
matches :I
then we set p=1
p==1
we skip that record and move to nextp==1
&& $4
matches :E
then we set p=2
thus allowing remaining records to print.Upvotes: 3