shenTTT
shenTTT

Reputation: 57

How to remove the first three character from the fasta file header

I have a fasta file like this:

>rna-XM_00001.1 
actact
>rna-XM_00002.1
atcatc

How do I remove the 'rna-' so it become

>XM_00001.1 
actact
>XM_00002.1
atcatc

Upvotes: 0

Views: 891

Answers (1)

Eric Backus
Eric Backus

Reputation: 1924

What you're showing is the file contents? Then sed should be able to do this:

sed 's/^>rna-/>/' < inputfile > outputfile

Explanation:

  • The first character of the command-line to sed is s, which tells sed to do substitution
  • The / are delimiters
  • The ^ tells sed to look only at the start of a line
  • The next >rna- is the pattern to match at the start of a line
  • The next > is the replacement substituted for the pattern

If, instead, you want to always remove the first four characters after a > as long as they end in -, you could use:

sed 's/^>...-/>/' < inputfile > outputfile

Explanation:

  • This is similar to above, except the pattern to match at the start of a line is >...-. The pattern is a regexp, where a . matches any single character. So this pattern matches any line starting with >, followed by any three characters, followed by -.

Upvotes: 1

Related Questions