user3164493
user3164493

Reputation: 33

extract data from a file

I have some files as follows.

file1.txt

145 THR  P Dl   -91.52   173.90   179.36    66.67   999.99   999.99   999.99
146 SER  C Cl  -125.16   155.03   178.68    67.76   999.99   999.99   999.99
147 MET  T Ee   -52.96   -35.79  -179.13   -65.71   -58.28   -60.34   999.99
151 TYR  C Ck  -125.69   145.40  -179.22   -54.88   -59.25   999.99   999.99
156 ARG  E Bk  -136.06   137.44  -179.24   -55.85   173.98    48.70  -165.24
158 ILE  E Dj   -98.77   116.42  -179.37   -51.55   -54.79   999.99   999.99

file2.txt

33 PHE  C Ch  -120.45    41.86  -177.95   -56.61   -71.40   999.99   999.99
36 VAL  C Ck  -119.10   147.98  -177.54    94.59   999.99   999.99   999.99
41 LEU  H Ee   -61.78   -50.08   179.33   175.84    50.72   999.99   999.99
42 THR  H Ee   -60.72   -40.55   178.79   -65.97   999.99   999.99   999.99

I need to extract the second column if the third column is H. I used the following program to extract this

awk '{
 if (FNR == 1 ) print newline ">" FILENAME  
 if ($3 == "H") {
newline="\n";
 printf $2
 }
 } 
 END { printf "\n"}'   *.txt>output

Output of the above program

>file1.txt    
THRSERMETTYRARGILE
>file2.txt
PHEVALLEUTHR

I would like to get the output like this instead of the above output using the following table.For example, THR denotes T, SER denotes S ....

>file1.txt
TSMYRI
>file2.txt
FVLT



ALA A
ARG R
ASN N
ASP D
CYS C
GLU E
GLN Q
GLY G
HIS H
ILE I
LEU L
LYS K
MET M
PHE F
PRO P
SER S
THR T
TRP W
TYR Y
VAL V

your help would be greatly appreciated!!

Upvotes: 3

Views: 77

Answers (2)

tripleee
tripleee

Reputation: 189387

Use an associative array to map each label to an abbreviation.

BEGIN { a["THR"]="T"; a["TYR"]="Y"; ... }

Then simply printf a[$2] instead of $2.

Upvotes: 2

Eran Ben-Natan
Eran Ben-Natan

Reputation: 2615

If I understand correctly, try:

gawk 'FNR==1 {print "\n>" FILENAME} $3=="H" {printf substr($2,1,1)} END {print ""}' *.txt

Upvotes: 0

Related Questions