AishwaryaKulkarni
AishwaryaKulkarni

Reputation: 784

Remove the first lines till the occurence of a regular expression in a column

I have some lines that I get in order using following

grep ENSG00000006114 File | sort -V 
chr17   35874900    35879174    ABCD0000006114:I25  -
chr17   35874901    35879174    ABCD0000006114:I25  -
chr17   35875548    35875671    ABCD0000006114:E27  -
chr17   35875672    35877289    ABCD0000006114:I26  -
chr17   35877290    35877445    ABCD0000006114:E26  -
chr17   35877446    35877932    ABCD0000006114:I25  -

However I want to delete the first rows that contain ':I' in the first row until I get ':E' for that I have been trying something like

grep ENSG00000006114 File | sort -V | awk '{if ($4 ~ /:I/ && NR==1) next};1'

However there might be more than one occurrences as in the above case for the first few rows, so how do I exclude rows containing :I until the first :E occurs in first row such that my final outcome would be:

   chr17   35875548    35875671    ABCD0000006114:E27  -
   chr17   35875672    35877289    ABCD0000006114:I26  -
   chr17   35877290    35877445    ABCD0000006114:E26  -
   chr17   35877446    35877932    ABCD0000006114:I25  -

Upvotes: 2

Views: 50

Answers (3)

James Brown
James Brown

Reputation: 37424

You could also just redirect it right back to grep:

$ grep ENSG00000006114 File | sort -V  | grep -A 10000000000000000 :E
chr17   35875548    35875671    ABCD0000006114:E27  -
chr17   35875672    35877289    ABCD0000006114:I26  -
chr17   35877290    35877445    ABCD0000006114:E26  -
chr17   35877446    35877932    ABCD0000006114:I25  -

Upvotes: 1

Ed Morton
Ed Morton

Reputation: 204015

Assuming the grep+sort are useful in that order due to your input file being enormous, all you need from awk is:

grep ENSG00000006114 File | sort -V | awk '$4~/:E/{f=1} f'

and if the file isn't huge you can lose the grep:

sort -V File | awk '!/ENSG00000006114/{next} $4~/:E/{f=1} f'

Upvotes: 4

anubhava
anubhava

Reputation: 785581

You can use this awk:

grep ENSG00000006114 File | sort -V |
awk 'p==1 && $4 ~ /:E/{p=2} !p && $4 ~ /:I/{p=1} p==1{next} 1'

chr17   35875548    35875671    ABCD0000006114:E27  -
chr17   35875672    35877289    ABCD0000006114:I26  -
chr17   35877290    35877445    ABCD0000006114:E26  -
chr17   35877446    35877932    ABCD0000006114:I25  -
  • When p==0 && $4 matches :I then we set p=1
  • While p==1 we skip that record and move to next
  • When p==1 && $4 matches :E then we set p=2 thus allowing remaining records to print.

Upvotes: 3

Related Questions