Reputation: 1150
I have a (fasta) file input.fa
that looks like this
>coucou
GAGAGATAGTATAGATATATAGGATATATA
>hello_world
GATATATTCTCTCTGAFAGACGACGACFGACTACTACGAC
>ziva_wesh
HAHTAHTAHTAHCGAGAGACAGCAGCAGCACTTACTACATCHBACAHCAHCAHA
I would like to get rid of both
>coucou
GAGAGATAGTATAGATATATAGGATATATA
and
>ziva_wesh
HAHTAHTAHTAHCGAGAGACAGCAGCAGCACTTACTACATCHBACAHCAHCAHA
What I am doing is (based on this solution by @Hai Vu)
$awk '/hello/{getline;next} 1' input.fa | awk '/coucou/{getline;next} 1'
>ziva_wesh
HAHTAHTAHTAHCGAGAGACAGCAGCAGCACTTACTACATCHBACAHCAHCAHA
Is there a way of doing this (using awk
or sed
or perl
script) without "piping" the first awk
result into a second awk
command? (sthg like /hello&coucou/{getline;next} 1' input.fa
)
Thanks for your answer!
Upvotes: 0
Views: 376
Reputation: 58401
This might work for you (GNU sed):
sed -r '/>(coucou|ziva_wesh)/,+1d' file
This deletes the ranges of 2 lines (the match of the line containing >coucou
or >ziva_wesh
and the following line).
Upvotes: 1
Reputation: 785128
A simple sed command can also handle this:
sed -nr '/>(hello|coucou)/{N;d};p' file
>ziva_wesh
HAHTAHTAHTAHCGAGAGACAGCAGCAGCACTTACTACATCHBACAHCAHCAHA
Upvotes: 3
Reputation: 10865
One simple way:
$ awk '/hello/{getline;next} /coucou/{getline;next} 1' input.fa
>ziva_wesh
HAHTAHTAHTAHCGAGAGACAGCAGCAGCACTTACTACATCHBACAHCAHCAHA
Or if you prefer:
$ awk '/(hello)|(coucou)/{getline;next} 1' input.fa
>ziva_wesh
HAHTAHTAHTAHCGAGAGACAGCAGCAGCACTTACTACATCHBACAHCAHCAHA
Upvotes: 3