tlorin
tlorin

Reputation: 1150

Remove multiple lines and following based on string in bash

I have a (fasta) file input.fa that looks like this

>coucou
GAGAGATAGTATAGATATATAGGATATATA
>hello_world
GATATATTCTCTCTGAFAGACGACGACFGACTACTACGAC
>ziva_wesh
HAHTAHTAHTAHCGAGAGACAGCAGCAGCACTTACTACATCHBACAHCAHCAHA

I would like to get rid of both

>coucou
GAGAGATAGTATAGATATATAGGATATATA

and

>ziva_wesh
HAHTAHTAHTAHCGAGAGACAGCAGCAGCACTTACTACATCHBACAHCAHCAHA

What I am doing is (based on this solution by @Hai Vu)

$awk '/hello/{getline;next} 1' input.fa | awk '/coucou/{getline;next} 1'
>ziva_wesh
HAHTAHTAHTAHCGAGAGACAGCAGCAGCACTTACTACATCHBACAHCAHCAHA

Is there a way of doing this (using awk or sed or perl script) without "piping" the first awk result into a second awk command? (sthg like /hello&coucou/{getline;next} 1' input.fa)

Thanks for your answer!

Upvotes: 0

Views: 376

Answers (3)

potong
potong

Reputation: 58401

This might work for you (GNU sed):

sed -r '/>(coucou|ziva_wesh)/,+1d' file

This deletes the ranges of 2 lines (the match of the line containing >coucou or >ziva_wesh and the following line).

Upvotes: 1

anubhava
anubhava

Reputation: 785128

A simple sed command can also handle this:

sed -nr '/>(hello|coucou)/{N;d};p' file
>ziva_wesh
HAHTAHTAHTAHCGAGAGACAGCAGCAGCACTTACTACATCHBACAHCAHCAHA

Upvotes: 3

jas
jas

Reputation: 10865

One simple way:

$ awk '/hello/{getline;next} /coucou/{getline;next} 1' input.fa 
>ziva_wesh
HAHTAHTAHTAHCGAGAGACAGCAGCAGCACTTACTACATCHBACAHCAHCAHA

Or if you prefer:

$ awk '/(hello)|(coucou)/{getline;next} 1' input.fa 
>ziva_wesh
HAHTAHTAHTAHCGAGAGACAGCAGCAGCACTTACTACATCHBACAHCAHCAHA

Upvotes: 3

Related Questions