Reputation: 41
I have a heterodimeric protein PDB file. However, unfortunately all residues have a chain ID of "A". I wish to change the chain ID to "B" for residues with residue number over 275.
Below two attempts at this. I'm able to change the chain names so that when I call get the full residue IDs of the model I see the correct chain ID associated with the correct residues. However, when I try to export this edited PDB the new format isn't there. Hoping it's a simple solution. (Will try an easier solution with bash in the meantime).
structure = PDBParser().get_structure(dirPath, 'heterodimer/heterodimer.pdb')
model = structure[0]
chains = structure.get_chains()
chainA = 275
for model in structure:
for chains in model:
for residues in chains:
if residues.get_id()[1] > chainA:
chains.id = "B"
else:
chains.id = "A"
io.set_structure(chains)
savename = "{}_edit.pdb".format(name)
io.save(savename)
or
structure = PDBParser().get_structure(dirPath, 'heterodimer/heterodimer.pdb')
model = structure[0]
chainOriginal = model["A"]
residues = chainOriginal.get_residues()
chainA = 275
for res in residues:
if res.get_full_id()[3][1] > chainA:
chainOriginal.id = "B"
else:
chainOriginal.id = "A"
io.set_structure(structure)
savename = "{}_edit.pdb".format(name)
io.save(savename)
Thank you in advance for any help. Seems like I'm missing something simple.
P.S. I've also tried converting the tuple from .get_full_id() to a list, editing the value of the chain index, then converting back to tuple. However, after that point I'm stuck.
Upvotes: 1
Views: 1137
Reputation: 3023
OK kind of figured out a way to do it.
My input test.pdb
file, goes from res 28 to res 38:
ATOM 1 N ARG A 28 5.140 67.453 130.620 1.00 92.14 N
ATOM 2 CA ARG A 28 6.590 67.605 130.291 1.00 92.98 C
ATOM 3 C ARG A 28 7.073 66.494 129.345 1.00 92.59 C
ATOM 4 O ARG A 28 6.604 65.355 129.412 1.00 94.41 O
ATOM 5 CB ARG A 28 7.424 67.592 131.579 1.00 94.44 C
ATOM 6 CG ARG A 28 8.892 68.009 131.399 1.00 98.47 C
ATOM 7 CD ARG A 28 9.057 69.524 131.173 1.00100.38 C
ATOM 8 NE ARG A 28 8.447 70.007 129.929 1.00103.21 N
ATOM 9 CZ ARG A 28 8.993 69.901 128.716 1.00102.43 C
ATOM 10 NH1 ARG A 28 10.180 69.327 128.558 1.00100.66 N
ATOM 11 NH2 ARG A 28 8.343 70.366 127.654 1.00101.21 N
ATOM 12 N THR A 29 8.016 66.838 128.470 1.00 90.22 N
ATOM 13 CA THR A 29 8.577 65.908 127.487 1.00 87.63 C
ATOM 14 C THR A 29 9.851 65.196 127.960 1.00 87.54 C
ATOM 15 O THR A 29 10.650 65.762 128.708 1.00 86.81 O
ATOM 16 CB THR A 29 8.899 66.653 126.179 1.00 86.80 C
ATOM 17 OG1 THR A 29 7.677 67.052 125.551 1.00 86.42 O
ATOM 18 CG2 THR A 29 9.692 65.775 125.233 1.00 86.58 C
ATOM 19 N VAL A 30 10.038 63.955 127.509 1.00 86.06 N
ATOM 20 CA VAL A 30 11.214 63.162 127.876 1.00 85.61 C
ATOM 21 C VAL A 30 11.819 62.420 126.680 1.00 83.99 C
ATOM 22 O VAL A 30 11.138 61.626 126.027 1.00 83.81 O
ATOM 23 CB VAL A 30 10.869 62.127 128.962 1.00 85.41 C
ATOM 24 CG1 VAL A 30 10.443 62.836 130.236 1.00 86.86 C
ATOM 25 CG2 VAL A 30 9.761 61.216 128.474 1.00 85.88 C
ATOM 26 N LYS A 31 13.095 62.687 126.400 1.00 82.01 N
ATOM 27 CA LYS A 31 13.801 62.049 125.285 1.00 80.21 C
ATOM 28 C LYS A 31 14.443 60.767 125.783 1.00 78.14 C
ATOM 29 O LYS A 31 15.316 60.794 126.657 1.00 76.23 O
ATOM 30 CB LYS A 31 14.896 62.962 124.733 1.00 80.16 C
ATOM 31 CG LYS A 31 15.442 62.532 123.376 1.00 79.27 C
ATOM 32 CD LYS A 31 16.752 63.253 123.060 1.00 81.24 C
ATOM 33 CE LYS A 31 16.866 63.612 121.584 1.00 79.14 C
ATOM 34 NZ LYS A 31 15.879 64.670 121.218 1.00 78.95 N
ATOM 35 N LEU A 32 14.023 59.646 125.211 1.00 76.32 N
ATOM 36 CA LEU A 32 14.540 58.355 125.631 1.00 74.56 C
ATOM 37 C LEU A 32 15.356 57.681 124.543 1.00 71.45 C
ATOM 38 O LEU A 32 14.991 57.707 123.367 1.00 70.89 O
ATOM 39 CB LEU A 32 13.373 57.457 126.034 1.00 76.38 C
ATOM 40 CG LEU A 32 13.683 56.242 126.895 1.00 78.46 C
ATOM 41 CD1 LEU A 32 14.416 56.685 128.150 1.00 82.41 C
ATOM 42 CD2 LEU A 32 12.383 55.547 127.262 1.00 79.03 C
ATOM 43 N LEU A 33 16.467 57.078 124.944 1.00 68.54 N
ATOM 44 CA LEU A 33 17.325 56.379 124.003 1.00 65.98 C
ATOM 45 C LEU A 33 17.421 54.882 124.255 1.00 64.69 C
ATOM 46 O LEU A 33 17.763 54.439 125.360 1.00 61.69 O
ATOM 47 CB LEU A 33 18.735 56.957 124.020 1.00 64.32 C
ATOM 48 CG LEU A 33 18.944 58.227 123.202 1.00 65.71 C
ATOM 49 CD1 LEU A 33 20.435 58.515 123.129 1.00 62.52 C
ATOM 50 CD2 LEU A 33 18.368 58.046 121.804 1.00 65.12 C
ATOM 51 N LEU A 34 17.108 54.111 123.216 1.00 62.63 N
ATOM 52 CA LEU A 34 17.203 52.662 123.271 1.00 57.92 C
ATOM 53 C LEU A 34 18.521 52.328 122.608 1.00 56.33 C
ATOM 54 O LEU A 34 18.633 52.392 121.388 1.00 58.81 O
ATOM 55 CB LEU A 34 16.069 52.013 122.482 1.00 58.79 C
ATOM 56 CG LEU A 34 14.715 51.943 123.175 1.00 61.10 C
ATOM 57 CD1 LEU A 34 13.685 51.426 122.200 1.00 58.98 C
ATOM 58 CD2 LEU A 34 14.815 51.031 124.402 1.00 59.55 C
ATOM 59 N LEU A 35 19.527 51.999 123.402 1.00 54.20 N
ATOM 60 CA LEU A 35 20.825 51.653 122.846 1.00 53.19 C
ATOM 61 C LEU A 35 21.157 50.197 123.158 1.00 54.07 C
ATOM 62 O LEU A 35 20.560 49.593 124.054 1.00 54.18 O
ATOM 63 CB LEU A 35 21.911 52.556 123.425 1.00 52.87 C
ATOM 64 CG LEU A 35 21.745 54.072 123.284 1.00 55.11 C
ATOM 65 CD1 LEU A 35 23.086 54.737 123.612 1.00 52.86 C
ATOM 66 CD2 LEU A 35 21.309 54.436 121.872 1.00 49.52 C
ATOM 67 N GLY A 36 22.118 49.642 122.421 1.00 52.59 N
ATOM 68 CA GLY A 36 22.527 48.263 122.639 1.00 50.76 C
ATOM 69 C GLY A 36 23.074 47.644 121.370 1.00 47.34 C
ATOM 70 O GLY A 36 22.770 48.118 120.284 1.00 46.63 O
ATOM 71 N ALA A 37 23.863 46.583 121.494 1.00 45.98 N
ATOM 72 CA ALA A 37 24.441 45.937 120.322 1.00 45.81 C
ATOM 73 C ALA A 37 23.359 45.294 119.479 1.00 46.39 C
ATOM 74 O ALA A 37 22.184 45.266 119.866 1.00 46.64 O
ATOM 75 CB ALA A 37 25.472 44.887 120.742 1.00 45.26 C
ATOM 76 N GLY A 38 23.756 44.768 118.323 1.00 45.96 N
ATOM 77 CA GLY A 38 22.785 44.149 117.436 1.00 43.61 C
ATOM 78 C GLY A 38 22.108 42.942 118.049 1.00 44.49 C
ATOM 79 O GLY A 38 22.775 42.085 118.640 1.00 47.30 O
my code, aims to change res > 33 from chain A to chain B :
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Fri Dec 9 20:35:20 2022
@author: bob
https://stackoverflow.com/questions/74735845/splitting-and-renaming-protein-chain-with-biopythons-biopdb
"""
from Bio.PDB import PDBParser, PDBIO
from Bio.PDB.Chain import Chain
from Bio.PDB.Model import Model
from Bio.PDB.Structure import Structure
dirPath = ''
name = 'result_A_'
structure = PDBParser().get_structure(dirPath, 'test.pdb')
chainA = 33
res_to_change = []
for model in structure:
# print('\n model : ', model , model.get_id(),'\n')
for chains in model:
for residues in chains:
print('residues.get_id() : ', residues.get_id())
if residues.get_id()[1] > chainA:
res_to_change.append(residues)
print('residue to change chain : ', res_to_change)
print('\n model : ', model , model.get_id(),'\n')
### SEE https://stackoverflow.com/questions/25884758/deleteing-residue-from-pdb-using-biopython-library
for model in structure:
for chain in model:
[chain.detach_child(res.get_id()) for res in res_to_change]
### SEE https://stackoverflow.com/questions/33364370/how-to-add-chain-id-in-pdb
my_chain = Chain("B")
model.add(my_chain)
for res in res_to_change:
my_chain.add(res)
io = PDBIO()
io.set_structure(model)
savename = "{}_edit.pdb".format(name)
io.save(savename, write_end = True, preserve_atom_numbering = True)
### above I detached B chain residues and reattached a new chain B to my model made with detached residues
### below I create an empty structure and attach both old structure chain A with deleted residues and chain B as above
my_structure = Structure('1')
my_model = Model('1')
my_structure.add(my_model)
my_model.add(model['A'])
my_model.add(my_chain)
print(my_model)
for i in my_model:
print(i)
for ii in i:
print(ii)
for iii in ii:
print(iii)
io2 = PDBIO()
io2.set_structure(my_model)
savename = "{}_edit_new_model.pdb".format(name)
io2.save(savename, write_end = False, preserve_atom_numbering = True)
it saves two pdb files, both correct to me. I tried both ways because I was getting wrong results in the chain A TER atom numbering such as:
ATOM 49 CD1 LEU A 33 20.435 58.515 123.129 1.00 62.52 C
ATOM 50 CD2 LEU A 33 18.368 58.046 121.804 1.00 65.12 C
TER 51 LEU A 33 <-------------- ERROR ??
ATOM 51 N LEU B 34 17.108 54.111 123.216 1.00 62.63 N
ATOM 52 CA LEU B 34 17.203 52.662 123.271 1.00 57.92 C
Wasnt able to figure out why, got better results using:
io.save(savename, write_end = False, preserve_atom_numbering = True)
;
preserve_atom_numbering = True/False( as default) makes the difference see:
ATOM 49 CD1 LEU A 33 20.435 58.515 123.129 1.00 62.52 C
ATOM 50 CD2 LEU A 33 18.368 58.046 121.804 1.00 65.12 C
TER 50 LEU A 33 <------------------- Here !!!
ATOM 51 N LEU B 34 17.108 54.111 123.216 1.00 62.63 N
ATOM 52 CA LEU B 34 17.203 52.662 123.271 1.00 57.92 C
dont understand why, hopefully somebody here could help.
Upvotes: 1