Reputation: 623
I have a small sample data set test1.faa
>PROKKA_00001_A1@hypothetical@protein
MTIALHLTAVLAFAALAGCGANDSDPGPGGVTVSEARALDQAAEMLEKRGRSPADENAEQAERLRREQAQARTPGQPPEQALQQDGASAPE
>PROKKA_00002_A1@Cystathionine@beta-lyase
MHRFGGMVTAILKGGLDDARRFLERCELFALAESLGGVESLIEHPAIMTHASVPREIREALGISDGLVRLSVGIEDADDLLAELETALA
>PROKKA_00003_A1@hypothetical@protein
MVPIVSAAPVFTLLLTVAVFRRERLTAGRIAAVAVVVPSVILIALGH
and I would like to add the length of the following line to the headerline, followed by next line, such as
>PROKKA_00001_A1@hypothetical@protein_92
MTIALHLTAVLAFAALAGCGANDSDPGPGGVTVSEARALDQAAEMLEKRGRSPADENAEQAERLRREQAQARTPGQPPEQALQQDGASAPE
I tried to do this with awk, but it returns the following error:
awk: >PROKKA_00001_A1@hypothetical@protein: No such file or directory
I assume it is related to the >
in the beginning? But I need it in the output file.
This is the code I tried:
#!/bin/bash
cat test1.faa | while read line
do
headerline=$(awk '/>/{print $0}' $line)
echo -e "this is the headerline \n ${headerline}"
fastaline=$(awk '!/>/{print $0}' $line)
echo -e "this is the fastaline \n ${fastaline}"
fastaline_length=$(awk -v linelength=$fastaline '{print length(linelength)}')
echo -e "this is length of fastaline \n ${fastaline_length}"
echo "${headerline}_${fastaline_length}"
echo $fastaline
done
Any suggestions on how to do this?
Upvotes: 1
Views: 63
Reputation: 133428
Could you please try following(considering that your actual Input_file is same as shown sample).
awk '/^>/{value=$0;next} {print value"_"length($0) ORS $0;value=""}' Input_file
Upvotes: 3
Reputation: 50750
this awk command would do what you want
awk '
/^>/ {
getline next_line
print $0 "_" length(next_line)
print next_line
}
' test1.faa
Upvotes: 1