aeli
aeli

Reputation: 187

Awk? Append an incremental number to each line containing a symbol

I have a file of sequences.

>seqA
lakjdsflakjsdlkjals;dkjfal;kdjsfl;aksdjf;lakjsdfl;kjalsdkjf
alsdkjfalskdjf;alsdfj;alkdjsf;lakjsdf;lkajsd
>seqB
fjal;kdjsfla;kdjsflkajdslkjfaghal;sdkjg
>seqC
a;lksdjl;akjsdg;lkjsdfl;kajdsl;kgj;alkdjsg;lkajsdgl
lsdkfja;lksdjf;lakdjsf;lkajsdfl;kjal;sdkfjal;skdjak
sdkjfal;ksdjflk;ahdglkahsdl;kghalk

I'd like to add an increasing incremental number after each ">"

For example:

Output file:

>1seqA
lakjdsflakjsdlkjals;dkjfal;kdjsfl;aksdjf;lakjsdfl;kjalsdkjf
alsdkjfalskdjf;alsdfj;alkdjsf;lakjsdf;lkajsd
>2seqB
fjal;kdjsfla;kdjsflkajdslkjfaghal;sdkjg
>3seqC
a;lksdjl;akjsdg;lkjsdfl;kajdsl;kgj;alkdjsg;lkajsdgl
lsdkfja;lksdjf;lakdjsf;lkajsdfl;kjal;sdkfjal;skdjak
sdkjfal;ksdjflk;ahdglkahsdl;kghalk

So far after scouring the internet I've tried:

awk -F "i=1" '{if (/>/){print $0i++} else print}'

and it didn't do anything. What am I doing wrong?

Thanks!

Upvotes: 6

Views: 2301

Answers (3)

glenn jackman
glenn jackman

Reputation: 247012

A slight variation:

awk -F'>' -v OFS='>' 'NF == 2 {$2 = ++count $2} 1' file

That uses ">" as the field separator, and uses the number of fields as the condition.

Upvotes: 1

ctac_
ctac_

Reputation: 2491

You can try

awk '/^>/{sub(/^>/,">"++i)}1' infile

Upvotes: 4

John1024
John1024

Reputation: 113924

Try:

awk '/>/{$0 = ">" ++i substr($0, 2)} 1'

For example:

$ awk '/>/{$0 = ">" ++i substr($0, 2)} 1' file
>1seqA
lakjdsflakjsdlkjals;dkjfal;kdjsfl;aksdjf;lakjsdfl;kjalsdkjf
alsdkjfalskdjf;alsdfj;alkdjsf;lakjsdf;lkajsd
>2seqB
fjal;kdjsfla;kdjsflkajdslkjfaghal;sdkjg
>3seqC
a;lksdjl;akjsdg;lkjsdfl;kajdsl;kgj;alkdjsg;lkajsdgl
lsdkfja;lksdjf;lakdjsf;lkajsdfl;kjal;sdkfjal;skdjak
sdkjfal;ksdjflk;ahdglkahsdl;kghalk

How it works

  • />/{$0 = ">" ++i substr($0, 2)}

    This selects lines that contain >. For those lines, we replace the line $0 with > followed by ++i (which is the value of the variable i after it has been incremented) followed by the current line starting at its second character.

  • 1

    This is awk's shorthand for print-the-line.

Upvotes: 4

Related Questions