problem with filtering some parts of text file in bash

Question

I have a file like the small example: small example:

>ENSG00000004142|ENST00000003607|POLDIP2|||2118
Sequence unavailable
>ENSG00000003056|ENST00000000412|M6PR|9099001;9102084|9099001;9102551|2756
CCAGGTTGTTTGCCTCTGGTCGGAAAGGGAAACTACCCCTGCTTCCACTCTGACAGCAGA

but I have too many "Sequence unavailable". I want to get rid of those transcripts. and the results would be like this:

>ENSG00000003056|ENST00000000412|M6PR|9099001;9102084|9099001;9102551|2756
CCAGGTTGTTTGCCTCTGGTCGGAAAGGGAAACTACCCCTGCTTCCACTCTGACAGCAGA

I tried to filter out those parts in bash using

grep -A 2 "Sequence" your.fa | grep -v "\-\-" | sed -n '/Sequence/!p' > new.fa

but it just removes "Sequence unavailable" but not its header (the line starts with ">" above each sequence which is identifier for each sequence)

how can I filter out them in bash or python?

problem with filtering some parts of text file in bash

Answers (1)

Related Questions