Reputation: 127
I have searched everywhere but I still don't have the answer that I'm looking for. I have the following pdb file (file1):
ATOM 1 N SER A 1 31.848 -5.217 38.114 1.00 39.55
ATOM 2 CA SER A 1 31.668 -5.130 36.630 1.00 40.83
ATOM 3 C SER A 1 30.991 -3.833 36.183 1.00 40.24
ATOM 4 O SER A 1 30.868 -2.883 36.961 1.00 40.08
ATOM 5 CB SER A 1 30.854 -6.329 36.118 1.00 41.46
ATOM 6 OG SER A 1 31.600 -7.531 36.190 1.00 44.54
ATOM 7 N THR A 2 30.605 -3.796 34.906 1.00 39.92
ATOM 8 CA THR A 2 29.920 -2.658 34.286 1.00 38.97
ATOM 9 C THR A 2 28.542 -3.116 33.777 1.00 38.40
ATOM 10 O THR A 2 27.815 -2.341 33.141 1.00 38.79
ATOM 11 CB THR A 2 30.734 -2.067 33.086 1.00 39.67
ATOM 12 OG1 THR A 2 31.045 -3.101 32.139 1.00 38.83
ATOM 13 CG2 THR A 2 32.020 -1.403 33.566 1.00 38.83
I also have the following file after some calculation using gfortran (file2):
1 0.14364205034979632
2 0.50527753403393372
What I'd like to do is replace column 11 of file1 with column 2 of file2 for as long as column 6 of file1 is equal to column 1 of file2. Essentially, the output should be like this:
ATOM 1 N SER A 1 31.848 -5.217 38.114 1.00 0.14364205034979632
ATOM 2 CA SER A 1 31.668 -5.130 36.630 1.00 0.14364205034979632
ATOM 3 C SER A 1 30.991 -3.833 36.183 1.00 0.14364205034979632
ATOM 4 O SER A 1 30.868 -2.883 36.961 1.00 0.14364205034979632
ATOM 5 CB SER A 1 30.854 -6.329 36.118 1.00 0.14364205034979632
ATOM 6 OG SER A 1 31.600 -7.531 36.190 1.00 0.14364205034979632
ATOM 7 N THR A 2 30.605 -3.796 34.906 1.00 0.50527753403393372
ATOM 8 CA THR A 2 29.920 -2.658 34.286 1.00 0.50527753403393372
ATOM 9 C THR A 2 28.542 -3.116 33.777 1.00 0.50527753403393372
ATOM 10 O THR A 2 27.815 -2.341 33.141 1.00 0.50527753403393372
ATOM 11 CB THR A 2 30.734 -2.067 33.086 1.00 0.50527753403393372
ATOM 12 OG1 THR A 2 31.045 -3.101 32.139 1.00 0.50527753403393372
ATOM 13 CG2 THR A 2 32.020 -1.403 33.566 1.00 0.50527753403393372
I have the following code:
gawk '
FNR==NR { pdb[NR]=$0; next }
{
split(pdb[FNR],flds,FS,seps)
while ( flds[6]==$1 ) {
flds[11]=$2
for (i=1;i in flds;i++)
printf "%s%s", flds[i], seps[i]
print ""
}
}
' "file1" "file2" > "output.pdb"
It gets the job done for the first line of file1 and it keeps the spacing consistent. The problem is that it doesn't proceed to the next lines and the first line is also repeated perpetually. Could anyone be so kind to help me out?
Thanks! I'd treat you for some beer :)
Upvotes: 1
Views: 115
Reputation: 203209
This is an incredibly common problem, I'm surprised you couldn't find a solution:
$ awk 'NR==FNR{a[$1]=$2;next} {$11=a[$6]} 1' file2 file1
ATOM 1 N SER A 1 31.848 -5.217 38.114 1.00 0.14364205034979632
ATOM 2 CA SER A 1 31.668 -5.130 36.630 1.00 0.14364205034979632
ATOM 3 C SER A 1 30.991 -3.833 36.183 1.00 0.14364205034979632
ATOM 4 O SER A 1 30.868 -2.883 36.961 1.00 0.14364205034979632
ATOM 5 CB SER A 1 30.854 -6.329 36.118 1.00 0.14364205034979632
ATOM 6 OG SER A 1 31.600 -7.531 36.190 1.00 0.14364205034979632
ATOM 7 N THR A 2 30.605 -3.796 34.906 1.00 0.50527753403393372
ATOM 8 CA THR A 2 29.920 -2.658 34.286 1.00 0.50527753403393372
ATOM 9 C THR A 2 28.542 -3.116 33.777 1.00 0.50527753403393372
ATOM 10 O THR A 2 27.815 -2.341 33.141 1.00 0.50527753403393372
ATOM 11 CB THR A 2 30.734 -2.067 33.086 1.00 0.50527753403393372
ATOM 12 OG1 THR A 2 31.045 -3.101 32.139 1.00 0.50527753403393372
ATOM 13 CG2 THR A 2 32.020 -1.403 33.566 1.00 0.50527753403393372
If you care about preserving the white space:
$ awk 'NR==FNR{a[$1]=$2;next} {sub(/[^[:space:]]+[[:space:]]*$/,a[$6])} 1' file2 file1
ATOM 1 N SER A 1 31.848 -5.217 38.114 1.00 0.14364205034979632
ATOM 2 CA SER A 1 31.668 -5.130 36.630 1.00 0.14364205034979632
ATOM 3 C SER A 1 30.991 -3.833 36.183 1.00 0.14364205034979632
ATOM 4 O SER A 1 30.868 -2.883 36.961 1.00 0.14364205034979632
ATOM 5 CB SER A 1 30.854 -6.329 36.118 1.00 0.14364205034979632
ATOM 6 OG SER A 1 31.600 -7.531 36.190 1.00 0.14364205034979632
ATOM 7 N THR A 2 30.605 -3.796 34.906 1.00 0.50527753403393372
ATOM 8 CA THR A 2 29.920 -2.658 34.286 1.00 0.50527753403393372
ATOM 9 C THR A 2 28.542 -3.116 33.777 1.00 0.50527753403393372
ATOM 10 O THR A 2 27.815 -2.341 33.141 1.00 0.50527753403393372
ATOM 11 CB THR A 2 30.734 -2.067 33.086 1.00 0.50527753403393372
ATOM 12 OG1 THR A 2 31.045 -3.101 32.139 1.00 0.50527753403393372
ATOM 13 CG2 THR A 2 32.020 -1.403 33.566 1.00 0.50527753403393372
Upvotes: 1
Reputation: 23667
This solution is gawk specific (see Defining Fields by Content) and assumes file2 to have two columns separated by single space to get output as per requirement
awk 'BEGIN {FPAT = "([[:space:]]*[[:alnum:][:punct:][:digit:]]+)"; OFS = "";} FNR==NR{a[$1]=$2; next} {$11=a[$6+0]} {print}' file2 file1
{$11=a[$6+0]}
so that values of $6
like " 1" and " 2" will match against values in array a
like "1" and "2" in numeric context instead of string comparison (Thanks @Ed Morton for the explanation)References:
Upvotes: 0
Reputation: 88563
I assume that file1 is sorted by column 6.
join -1 6 -2 1 file1 file2 -o 1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8,1.9,1.10,2.2 | column -t
Output:
ATOM 1 N SER A 1 31.848 -5.217 38.114 1.00 0.14364205034979632 ATOM 2 CA SER A 1 31.668 -5.130 36.630 1.00 0.14364205034979632 ATOM 3 C SER A 1 30.991 -3.833 36.183 1.00 0.14364205034979632 ATOM 4 O SER A 1 30.868 -2.883 36.961 1.00 0.14364205034979632 ATOM 5 CB SER A 1 30.854 -6.329 36.118 1.00 0.14364205034979632 ATOM 6 OG SER A 1 31.600 -7.531 36.190 1.00 0.14364205034979632 ATOM 7 N THR A 2 30.605 -3.796 34.906 1.00 0.50527753403393372 ATOM 8 CA THR A 2 29.920 -2.658 34.286 1.00 0.50527753403393372 ATOM 9 C THR A 2 28.542 -3.116 33.777 1.00 0.50527753403393372 ATOM 10 O THR A 2 27.815 -2.341 33.141 1.00 0.50527753403393372 ATOM 11 CB THR A 2 30.734 -2.067 33.086 1.00 0.50527753403393372 ATOM 12 OG1 THR A 2 31.045 -3.101 32.139 1.00 0.50527753403393372 ATOM 13 CG2 THR A 2 32.020 -1.403 33.566 1.00 0.50527753403393372
Update:
With bash's printf:
printf "%s %6.d %-3s %s %s %s %s %s %s %s %s\n" $(join -1 6 -2 1 file1 file2 -o 1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8,1.9,1.10,2.2)
Output:
ATOM 1 N SER A 1 31.848 -5.217 38.114 1.00 0.14364205034979632 ATOM 2 CA SER A 1 31.668 -5.130 36.630 1.00 0.14364205034979632 ATOM 3 C SER A 1 30.991 -3.833 36.183 1.00 0.14364205034979632 ATOM 4 O SER A 1 30.868 -2.883 36.961 1.00 0.14364205034979632 ATOM 5 CB SER A 1 30.854 -6.329 36.118 1.00 0.14364205034979632 ATOM 6 OG SER A 1 31.600 -7.531 36.190 1.00 0.14364205034979632 ATOM 7 N THR A 2 30.605 -3.796 34.906 1.00 0.50527753403393372 ATOM 8 CA THR A 2 29.920 -2.658 34.286 1.00 0.50527753403393372 ATOM 9 C THR A 2 28.542 -3.116 33.777 1.00 0.50527753403393372 ATOM 10 O THR A 2 27.815 -2.341 33.141 1.00 0.50527753403393372 ATOM 11 CB THR A 2 30.734 -2.067 33.086 1.00 0.50527753403393372 ATOM 12 OG1 THR A 2 31.045 -3.101 32.139 1.00 0.50527753403393372 ATOM 13 CG2 THR A 2 32.020 -1.403 33.566 1.00 0.50527753403393372
Upvotes: 1