Reputation: 187
I have a file of sequences.
>seqA
lakjdsflakjsdlkjals;dkjfal;kdjsfl;aksdjf;lakjsdfl;kjalsdkjf
alsdkjfalskdjf;alsdfj;alkdjsf;lakjsdf;lkajsd
>seqB
fjal;kdjsfla;kdjsflkajdslkjfaghal;sdkjg
>seqC
a;lksdjl;akjsdg;lkjsdfl;kajdsl;kgj;alkdjsg;lkajsdgl
lsdkfja;lksdjf;lakdjsf;lkajsdfl;kjal;sdkfjal;skdjak
sdkjfal;ksdjflk;ahdglkahsdl;kghalk
I'd like to add an increasing incremental number after each ">"
For example:
Output file:
>1seqA
lakjdsflakjsdlkjals;dkjfal;kdjsfl;aksdjf;lakjsdfl;kjalsdkjf
alsdkjfalskdjf;alsdfj;alkdjsf;lakjsdf;lkajsd
>2seqB
fjal;kdjsfla;kdjsflkajdslkjfaghal;sdkjg
>3seqC
a;lksdjl;akjsdg;lkjsdfl;kajdsl;kgj;alkdjsg;lkajsdgl
lsdkfja;lksdjf;lakdjsf;lkajsdfl;kjal;sdkfjal;skdjak
sdkjfal;ksdjflk;ahdglkahsdl;kghalk
So far after scouring the internet I've tried:
awk -F "i=1" '{if (/>/){print $0i++} else print}'
and it didn't do anything. What am I doing wrong?
Thanks!
Upvotes: 6
Views: 2301
Reputation: 247012
A slight variation:
awk -F'>' -v OFS='>' 'NF == 2 {$2 = ++count $2} 1' file
That uses ">" as the field separator, and uses the number of fields as the condition.
Upvotes: 1
Reputation: 113924
Try:
awk '/>/{$0 = ">" ++i substr($0, 2)} 1'
For example:
$ awk '/>/{$0 = ">" ++i substr($0, 2)} 1' file
>1seqA
lakjdsflakjsdlkjals;dkjfal;kdjsfl;aksdjf;lakjsdfl;kjalsdkjf
alsdkjfalskdjf;alsdfj;alkdjsf;lakjsdf;lkajsd
>2seqB
fjal;kdjsfla;kdjsflkajdslkjfaghal;sdkjg
>3seqC
a;lksdjl;akjsdg;lkjsdfl;kajdsl;kgj;alkdjsg;lkajsdgl
lsdkfja;lksdjf;lakdjsf;lkajsdfl;kjal;sdkfjal;skdjak
sdkjfal;ksdjflk;ahdglkahsdl;kghalk
/>/{$0 = ">" ++i substr($0, 2)}
This selects lines that contain >
. For those lines, we replace the line $0
with >
followed by ++i
(which is the value of the variable i
after it has been incremented) followed by the current line starting at its second character.
1
This is awk's shorthand for print-the-line.
Upvotes: 4