user366312
user366312

Reputation: 16908

How to extract chain-IDs from PDB files?

I have all the PDB files stored on my local hard disk. The files are in pdbXXXX.ent.gz format.

I have a python program that reads a text file which must be in the following format:

pdb_id  chain_id  resolution

How can I prepare this plain text file from all those PDB files?

Upvotes: 0

Views: 937

Answers (1)

Lydia van Dyke
Lydia van Dyke

Reputation: 2516

You can parse a PDB file with Biopython even when it is compressed. You just need to be careful to open the file in text mode ("rt") - otherwise you end up with a TypeError.

I tested the following script with a rather small sample: 4 zipped PDB entries in a local folder.

import gzip
import warnings
from pathlib import Path
from Bio.PDB.PDBExceptions import PDBConstructionWarning
from Bio.PDB import PDBParser

# To get rid of those annoying warnings like 'WARNING: Chain B is discontinuous at line 4059.'
warnings.simplefilter('ignore', PDBConstructionWarning)

parser = PDBParser()

if __name__ == "__main__":
    pdb_zips = Path("zipped_pdbs").glob('**/*.ent.gz')
    for pdb_filename in pdb_zips:
        with gzip.open(pdb_filename, "rt") as file_handle:
            structure = parser.get_structure("?", file_handle)
        # you could of course parse the pdb code from the file name as well. 
        # But I found this to be easier implemented.       
        pdb_code = structure.header.get("idcode")
        resolution = structure.header.get("resolution")

        for chain in structure.get_chains():
            print(f"{pdb_code}  {chain.id}  {resolution}")

The output reads

7LWV  A  3.12
7LWV  B  3.12
7LWV  C  3.12
6U9D  A  3.19
6U9D  B  3.19
6U9D  C  3.19
6U9D  D  3.19
6U9D  E  3.19
6U9D  F  3.19
6U9D  G  3.19
6U9D  H  3.19
6U9D  I  3.19
6U9D  J  3.19
6U9D  K  3.19
6U9D  L  3.19
6U9D  M  3.19
6U9D  N  3.19
6U9D  O  3.19
6U9D  P  3.19
6U9D  Q  3.19
6U9D  R  3.19
6U9D  S  3.19
6U9D  T  3.19
6U9D  U  3.19
6U9D  V  3.19
6U9D  W  3.19
6U9D  X  3.19
1F34  A  2.45
1F34  B  2.45
2OXP  A  2.0

Upvotes: 1

Related Questions