Reputation: 773
I have a fasta file that looks like this:
>miR-92|LQNS02278089.1_34108_3p Parhyale hawaiensis 34108_3p
AATTGCACTCGTCCCGGCCTGC
>miR-92|LQNS02278089.1_34106_3p Parhyale hawaiensis 34106_3p
AATTGCACTGATCCCGGCCTGC
>LQNS02136402.1_14821_5p Parhyale hawaiensis 14821_5p
CCGTAAGGCCGAAGACAAGAA
>LQNS02278094.1_35771_5p Parhyale hawaiensis 35771_5p
AAGAATAAGCCCGAGCAAGTCGAT
I want to change the headers to make them look like this:
>miR-92|LQNS02278089.1_34108_3p Parhyale hawaiensis 34108_3p
AATTGCACTCGTCCCGGCCTGC
>miR-92|LQNS02278089.1_34106_3p Parhyale hawaiensis 34106_3p
AATTGCACTGATCCCGGCCTGC
>miR-LQNS02136402.1_14821_5p Parhyale hawaiensis 14821_5p
CCGTAAGGCCGAAGACAAGAA
>miR-LQNS02278094.1_35771_5p Parhyale hawaiensis 35771_5p
AAGAATAAGCCCGAGCAAGTCGAT
Note that not all the headers changed, just the last 2 in the example, where the word "miRs" was added.
So far I have been doing this like this:
perl -p -e "s/^>/>miR-/g" seq.fasta
But this will end up with some IDs having miR- added even though they already had it.
I know I can subset the file and apply this to just the ones missing the miR- at the beginning and then remerge but I would like to find an easier way to do it in one line without much manual intervention.
Upvotes: 1
Views: 98
Reputation: 1126
with awk
you can get the records that don't have miR
:
awk '$0 !~ /miR-/ && $0 ~ /^>/' file
>LQNS02136402.1_14821_5p Parhyale hawaiensis 14821_5p
>LQNS02278094.1_35771_5p Parhyale hawaiensis 35771_5p
and then put miR
only in those records:
awk '$0 !~ /miR-/ && $0 ~ /^>/ {gsub(/^>/, ">miR-")} 1' file
>miR-92|LQNS02278089.1_34108_3p Parhyale hawaiensis 34108_3p
AATTGCACTCGTCCCGGCCTGC
>miR-92|LQNS02278089.1_34106_3p Parhyale hawaiensis 34106_3p
AATTGCACTGATCCCGGCCTGC
>miR-LQNS02136402.1_14821_5p Parhyale hawaiensis 14821_5p
CCGTAAGGCCGAAGACAAGAA
>miR-LQNS02278094.1_35771_5p Parhyale hawaiensis 35771_5p
AAGAATAAGCCCGAGCAAGTCGA
Upvotes: 0
Reputation: 18371
You can to negative lookahead to only match the lines starting with >
but not followed by miR-
. Notice the single quotes.
perl -p -e 's/^>(?!miR-)/>miR-/g' file
Upvotes: 7