Reputation: 75
I have a fasta file, file.fasta, that has the following patterns:
>firstnumber 01abc_numericsequence
CGTAATCG
>secondnumber 01abc_anothernumericsequence
GGTAAACC
and so on, but I'd like the output to be something like:
>firstnumber
CGTAATCG
>secondnumber
CGTAAACC
How can I delete the pattern 01abc and everything that goes after it in each line, and overwrite the file.fasta?
Please, can anyone provide a solution?
Upvotes: 0
Views: 91
Reputation: 75
I've tried
sed 's/01abc*//' file.fasta
The problem is that not only did it remove the pattern, but it also didn't remove both _numericsequence and _anothernumericsequence. Also, the changes were not saved in file.fasta.
Then, I tried
ex -sc '%s/\(\01abc\).*/\1/ | x' file.fasta
And it removed both _numericsequence and _anothernumericsequence. The problem is that I want to remove the pattern too, and it didn't. Finally, I've tried
ex -sc '%s/\(\ \).*/\1/ | x' file.fasta
And it worked, because the other lines doesn't have any spaces, in this case.
Upvotes: 0
Reputation: 3022
cat fasta
>firstnumber 01abc_numericsequence
CGTAATCG
>secondnumber 01abc_anothernumericsequence
GGTAAACC
awk '/^>/ {$0=$1} 1' fasta
>firstnumber
CGTAATCG
>secondnumber
GGTAAACC
sed '/^>/ s/ .*//' fasta
>firstnumber
CGTAATCG
>secondnumber
GGTAAACC
Both the sed
and awk
replace everything from the first space (inclusive) onward on every line that starts with >
Upvotes: 1