Reputation: 1
I would need to ask for your help in editing my awk script. Here is the original version:
BEGIN { printf ("CRYST1 200.000 200.000 200.000 90.00 90.00 90.00 P 1 1\n")
maxatoms=1000
natom=0
found_struct = 0
found_bond = 0
}
{
if( NF == 5 )
{
foundff=0
natom++
fftype[natom]="UNKNOWN"
if ($1 ~ /CT/)
{
fftype[natom] = "C"
foundff=1
}
else if ($1 ~ /OH/)
{
fftype[natom] = "O"
foundff=1
}
else if ($1 ~ /HC/)
{
fftype[natom] = "H"
foundff=1
}
else if ($1 ~ /N/)
{
fftype[natom] = "N"
foundff=1
}
else if ($1 ~ /H1/)
{
fftype[natom] = "H"
foundff=1
}
else if ($1 ~ /HO/)
{
fftype[natom] = "H"
foundff=1
}
else if ($1 = "C")
{
fftype[natom] = "C"
foundff=1
}
else if ($1 = "O")
{
fftype[natom] = "O"
foundff=1
}
next
x[natom] = $1
y[natom] = $2
z[natom] = $3
if (foundff == 0)
printf("PROBLEM : Atom ff type %s not known\n", $6)
}
}
END {
for (iatom=1; iatom <= natom; iatom++)
{
printf("HETATM %d %2s %d %14.9f %14.9f %14.9f\n" ,
iatom, fftype[iatom], iatom, x[iatom], y[iatom], z[iatom])
}
printf ("END\n")
}
And this is type of file I am working with.
0 3 186 200 75202
timestep 500 186 0 3 0.002000 1.000000
40.0000000000 0.0000000000 0.0000000000
-0.0000000034 40.0000000000 0.0000000000
-0.0000000034 -0.0000000034 40.0000000000
CT_1 1 12.011000 0.061000 1.087513
-1.961325738 1.828501682 -8.933652557
CT_1 2 12.011000 0.061000 0.789711
-3.851025437 3.495427316 -10.05849230
CT_1 3 12.011000 0.061000 0.581330
-5.804493575 4.589489777 -8.369482861
ect
I would like to get this as an output:
CRYST1 200.000 200.000 200.000 90.00 90.00 90.00 P 1 1
HETATM 1 C 1 -1.961325738 1.828501682 -8.933652557
HETATM 2 C 2 -3.851025437 3.495427316 -10.05849230
HETATM 3 C 3 -5.804493575 4.589489777 -8.369482861
ect
But coordinates are not really picking up well (next line after CT_1 1 12.011000 0.061000 1.087513). Can you please have a look and suggest any solutions.
Upvotes: 0
Views: 113
Reputation: 7610
Not quite clear how You would like to process the "atoms", but I may suggest to use the getline
command to get the next line if a CT_1
is found. So You can immediately process if a line is found. It is not clear from the description if the first filed contains a _
and a number after it. I assume that there is a _
in it.
Something like this:
awk 'BEGIN { print "CRYST1 200.000 200.000 200.000 90.00 90.00 90.00 P 1 1" }
NR < 6 {next}
/^(CT|OH|HC|N|H1|HO|C|O)_/{a=$1;getline;++n;print "HETATM",n,substr(a,1,1),n,$1,$2,$3;next}
{ print "Bad line! ("$0")" }
' <<EOT
0 3 186 200 75202
timestep 500 186 0 3 0.002000 1.000000
40.0000000000 0.0000000000 0.0000000000
-0.0000000034 40.0000000000 0.0000000000
-0.0000000034 -0.0000000034 40.0000000000
CT_1 1 12.011000 0.061000 1.087513
-1.961325738 1.828501682 -8.933652557
CT_1 2 12.011000 0.061000 0.789711
-3.851025437 3.495427316 -10.05849230
CT_1 3 12.011000 0.061000 0.581330
-5.804493575 4.589489777 -8.369482861
OH_1 3 12.011000 0.061000 0.581330
-5.804493575 4.589489777 -8.369482861
HC_1 3 12.011000 0.061000 0.581330
-5.804493575 4.589489777 -8.369482861
QW_1 3 12.011000 0.061000 0.581330
-5.804493575 4.589489777 -8.369482861
EOT
Output:
CRYST1 200.000 200.000 200.000 90.00 90.00 90.00 P 1 1
HETATM 1 C 1 -1.961325738 1.828501682 -8.933652557
HETATM 2 C 2 -3.851025437 3.495427316 -10.05849230
HETATM 3 C 3 -5.804493575 4.589489777 -8.369482861
HETATM 4 O 3 -5.804493575 4.589489777 -8.369482861
HETATM 5 H 3 -5.804493575 4.589489777 -8.369482861
Bad line! (QW_1 3 12.011000 0.061000 0.581330)
Bad line! (-5.804493575 4.589489777 -8.369482861)
Upvotes: 1
Reputation: 85775
I wouldn't go with getline
try this:
awk '/^(H[1C0]|N|C|O)/{printf "HETATM %d %s %d ",++i,substr($1,1,1),i;p=1;next}p' file
HETATM 1 C 1 -1.961325738 1.828501682 -8.933652557
HETATM 2 C 2 -3.851025437 3.495427316 -10.05849230
HETATM 3 C 3 -5.804493575 4.589489777 -8.369482861
Just add the BEGIN
block to print the header and you should be sorted.
BEGIN { print "CRYST1 200.000 200.000 200.000 90.00 90.00 90.00 P 1 1" }
Upvotes: 1
Reputation: 2243
perl -ane ' if ($printNow == 1) {printf("HETATM %d %2s %d %14.9f %14.9f %14.9f\n" ,$i,$type,$i,$F[0],$F[1],$F[2]);$printNow =0;}; if (scalar @F == 5 and (/^CT/ or /^OH/ or /^HC/ or /^N/)) {$i++; $printNow =1 ; $type =substr($_,0,1)}' filename
hope this works +
Upvotes: 0