crazysantaclaus
crazysantaclaus

Reputation: 623

Add length of following line to current line in bash

I have a small sample data set test1.faa

>PROKKA_00001_A1@hypothetical@protein
MTIALHLTAVLAFAALAGCGANDSDPGPGGVTVSEARALDQAAEMLEKRGRSPADENAEQAERLRREQAQARTPGQPPEQALQQDGASAPE
>PROKKA_00002_A1@Cystathionine@beta-lyase
MHRFGGMVTAILKGGLDDARRFLERCELFALAESLGGVESLIEHPAIMTHASVPREIREALGISDGLVRLSVGIEDADDLLAELETALA
>PROKKA_00003_A1@hypothetical@protein
MVPIVSAAPVFTLLLTVAVFRRERLTAGRIAAVAVVVPSVILIALGH

and I would like to add the length of the following line to the headerline, followed by next line, such as

>PROKKA_00001_A1@hypothetical@protein_92
MTIALHLTAVLAFAALAGCGANDSDPGPGGVTVSEARALDQAAEMLEKRGRSPADENAEQAERLRREQAQARTPGQPPEQALQQDGASAPE

I tried to do this with awk, but it returns the following error:

awk: >PROKKA_00001_A1@hypothetical@protein: No such file or directory

I assume it is related to the >in the beginning? But I need it in the output file.

This is the code I tried:

#!/bin/bash

cat test1.faa | while read line
do
  headerline=$(awk '/>/{print $0}' $line)
  echo -e "this is the headerline \n ${headerline}"
  fastaline=$(awk '!/>/{print $0}' $line)
  echo -e "this is the fastaline \n ${fastaline}"
  fastaline_length=$(awk -v linelength=$fastaline '{print length(linelength)}')
  echo -e "this is length of fastaline \n ${fastaline_length}"
  echo "${headerline}_${fastaline_length}"
  echo $fastaline
done

Any suggestions on how to do this?

Upvotes: 1

Views: 63

Answers (2)

RavinderSingh13
RavinderSingh13

Reputation: 133428

Could you please try following(considering that your actual Input_file is same as shown sample).

awk '/^>/{value=$0;next} {print value"_"length($0) ORS $0;value=""}' Input_file

Upvotes: 3

oguz ismail
oguz ismail

Reputation: 50750

this awk command would do what you want

awk '
    /^>/ {
        getline next_line
        print $0 "_" length(next_line)
        print next_line
    }
' test1.faa

Upvotes: 1

Related Questions