Reputation: 33
I have some files as follows.
file1.txt
145 THR P Dl -91.52 173.90 179.36 66.67 999.99 999.99 999.99
146 SER C Cl -125.16 155.03 178.68 67.76 999.99 999.99 999.99
147 MET T Ee -52.96 -35.79 -179.13 -65.71 -58.28 -60.34 999.99
151 TYR C Ck -125.69 145.40 -179.22 -54.88 -59.25 999.99 999.99
156 ARG E Bk -136.06 137.44 -179.24 -55.85 173.98 48.70 -165.24
158 ILE E Dj -98.77 116.42 -179.37 -51.55 -54.79 999.99 999.99
file2.txt
33 PHE C Ch -120.45 41.86 -177.95 -56.61 -71.40 999.99 999.99
36 VAL C Ck -119.10 147.98 -177.54 94.59 999.99 999.99 999.99
41 LEU H Ee -61.78 -50.08 179.33 175.84 50.72 999.99 999.99
42 THR H Ee -60.72 -40.55 178.79 -65.97 999.99 999.99 999.99
I need to extract the second column if the third column is H. I used the following program to extract this
awk '{
if (FNR == 1 ) print newline ">" FILENAME
if ($3 == "H") {
newline="\n";
printf $2
}
}
END { printf "\n"}' *.txt>output
Output of the above program
>file1.txt
THRSERMETTYRARGILE
>file2.txt
PHEVALLEUTHR
I would like to get the output like this instead of the above output using the following table.For example, THR denotes T, SER denotes S ....
>file1.txt
TSMYRI
>file2.txt
FVLT
ALA A
ARG R
ASN N
ASP D
CYS C
GLU E
GLN Q
GLY G
HIS H
ILE I
LEU L
LYS K
MET M
PHE F
PRO P
SER S
THR T
TRP W
TYR Y
VAL V
your help would be greatly appreciated!!
Upvotes: 3
Views: 77
Reputation: 189387
Use an associative array to map each label to an abbreviation.
BEGIN { a["THR"]="T"; a["TYR"]="Y"; ... }
Then simply printf a[$2]
instead of $2
.
Upvotes: 2
Reputation: 2615
If I understand correctly, try:
gawk 'FNR==1 {print "\n>" FILENAME} $3=="H" {printf substr($2,1,1)} END {print ""}' *.txt
Upvotes: 0