Cyndi Kaulitz
Cyndi Kaulitz

Reputation: 75

Remove pattern from each line in fasta file in

I have a fasta file, file.fasta, that has the following patterns:

>firstnumber 01abc_numericsequence    
CGTAATCG  
>secondnumber 01abc_anothernumericsequence  
GGTAAACC    

and so on, but I'd like the output to be something like:

>firstnumber   
CGTAATCG  
>secondnumber   
CGTAAACC  

How can I delete the pattern 01abc and everything that goes after it in each line, and overwrite the file.fasta?

Please, can anyone provide a solution?

Upvotes: 0

Views: 91

Answers (2)

Cyndi Kaulitz
Cyndi Kaulitz

Reputation: 75

I've tried

sed 's/01abc*//' file.fasta

The problem is that not only did it remove the pattern, but it also didn't remove both _numericsequence and _anothernumericsequence. Also, the changes were not saved in file.fasta.
Then, I tried

ex -sc '%s/\(\01abc\).*/\1/ | x' file.fasta

And it removed both _numericsequence and _anothernumericsequence. The problem is that I want to remove the pattern too, and it didn't. Finally, I've tried

ex -sc '%s/\(\ \).*/\1/ | x' file.fasta

And it worked, because the other lines doesn't have any spaces, in this case.

Upvotes: 0

justaguy
justaguy

Reputation: 3022

cat fasta

>firstnumber 01abc_numericsequence    
CGTAATCG  
>secondnumber 01abc_anothernumericsequence  
GGTAAACC


awk '/^>/ {$0=$1} 1' fasta

>firstnumber
CGTAATCG  
>secondnumber
GGTAAACC

sed '/^>/ s/ .*//' fasta

>firstnumber
CGTAATCG  
>secondnumber
GGTAAACC

Both the sed and awk replace everything from the first space (inclusive) onward on every line that starts with >

Upvotes: 1

Related Questions