Reputation: 12587
I have some sample data which looks like:
ATOM 973 CG ARG A 61 -21.593 8.884 69.770 1.00 25.13 C
ATOM 974 CD ARG A 61 -21.610 7.433 69.314 1.00 23.44 C
ATOM 975 NE ARG A 61 -21.047 7.452 67.937 1.00 12.13 N
I want to replace the 6th column and only the 6th column by the addition of the offset value, in the case above it is 308.
So 61+308 = 369, so 61 in the 6th column should be replaced by 369
I can't str.split()
the line as the line spacing is very important.
I have tried tried using str.replace()
but the values in column 2 can also overlap with column 6
I did try reversing the line and use str.repalce()
but the values in columns 7,8,9,10 and 11 can overlap with the str
to be replaced.
The ugly code I have so far is (which partially works apart from if the values overlap in columns 7,8,9,10 and/or 11):
with open('2kqx.pdb', 'r') as inf, open('2kqx_renumbered.pdb', 'w') as outf:
for line in inf:
if line.startswith('ATOM'):
segs = line.split()
if segs[4] == 'A':
offset = 308
number = segs[5][::-1]
replacement = str((int(segs[5])+offset))[::-1]
print number[::-1],replacement
line_rev = line[::-1]
replaced_line = line_rev.replace(number,replacement,1)
print line
print replaced_line[::-1]
outf.write(replaced_line[::-1])
The code above produced this output below. As you can see in the second line the 6th column is not changed, but is changed in column 7. I thought by reversing the string I could bypass the potential overlap with column 2, but I forgot about the other columns and I dont really know how to get around it.
ATOM 973 CG ARG A 369 -21.593 8.884 69.770 1.00 25.13 C
ATOM 974 CD ARG A 61 -21.3690 7.433 69.314 1.00 23.44 C
ATOM 975 NE ARG A 369 -21.047 7.452 67.937 1.00 12.13 N
Upvotes: 1
Views: 570
Reputation: 15345
You better not try to parse PDB
-files on your own.
Use a PDB-Parser. There are many freely available inside different bio/computational chemistry packages, for instance
Here's how to it with biopython, assuming you input is raw.pdb
:
from Bio.PDB import PDBParser, PDBIO
parser=PDBParser()
structure = parser.get_structure('some_id', 'raw.pdb')
for r in structure.get_residues():
r.id = (r.id[0], r.id[1] + 308, r.id[2])
io = PDBIO()
io.set_structure(structure)
io.save('shifted.pdb')
I googled a bit and find a quick solution to solve your specific problem here (without third-party dependencies):
http://code.google.com/p/pdb-tools/
There is -- among many other useful pdb-python-script-tools -- this script pdb_offset.py
It is a standalone script and I just copied its pdb_offset
method to show it working, your three-line example code is in raw.pdb
:
def pdbOffset(pdb_file, offset):
"""
Adds an offset to the residue column of a pdb file without touching anything
else.
"""
# Read in the pdb file
f = open(pdb_file,'r')
pdb = f.readlines()
f.close()
out = []
for line in pdb:
# For and ATOM record, update residue number
if line[0:6] == "ATOM " or line[0:6] == "TER ":
num = offset + int(line[22:26])
out.append("%s%4i%s" % (line[0:22],num,line[26:]))
else:
out.append(line)
return "".join(out)
print pdbOffset('raw.pdb', 308)
which prints
ATOM 973 CG ARG A 369 -21.593 8.884 69.770 1.00 25.13 C
ATOM 974 CD ARG A 369 -21.610 7.433 69.314 1.00 23.44 C
ATOM 975 NE ARG A 369 -21.047 7.452 67.937 1.00 12.13 N
Upvotes: 1
Reputation:
data = """\
ATOM 973 CG ARG A 61 -21.593 8.884 69.770 1.00 25.13 C
ATOM 974 CD ARG A 61 -21.610 7.433 69.314 1.00 23.44 C
ATOM 975 NE ARG A 61 -21.047 7.452 67.937 1.00 12.13 N"""
offset = 308
for line in data.split('\n'):
line = line[:22] + " {:<5d} ".format(int(line[22:31]) + offset) + line[31:]
print line
I haven't done the exact counting of whitespace, that's just a rough estimate. If you want more flexibility than just having the numbers 22 and 31 scattered in your code, you'll need a way to determine your start and end index (but that contrasts my assumption that the data is in fixed column format).
Upvotes: 2