Reputation: 16908
I have all the PDB files stored on my local hard disk. The files are in pdbXXXX.ent.gz
format.
I have a python program that reads a text file which must be in the following format:
pdb_id chain_id resolution
How can I prepare this plain text file from all those PDB files?
Upvotes: 0
Views: 937
Reputation: 2516
You can parse a PDB file with Biopython even when it is compressed. You just need to be careful to open the file in text mode ("rt") - otherwise you end up with a TypeError.
I tested the following script with a rather small sample: 4 zipped PDB entries in a local folder.
import gzip
import warnings
from pathlib import Path
from Bio.PDB.PDBExceptions import PDBConstructionWarning
from Bio.PDB import PDBParser
# To get rid of those annoying warnings like 'WARNING: Chain B is discontinuous at line 4059.'
warnings.simplefilter('ignore', PDBConstructionWarning)
parser = PDBParser()
if __name__ == "__main__":
pdb_zips = Path("zipped_pdbs").glob('**/*.ent.gz')
for pdb_filename in pdb_zips:
with gzip.open(pdb_filename, "rt") as file_handle:
structure = parser.get_structure("?", file_handle)
# you could of course parse the pdb code from the file name as well.
# But I found this to be easier implemented.
pdb_code = structure.header.get("idcode")
resolution = structure.header.get("resolution")
for chain in structure.get_chains():
print(f"{pdb_code} {chain.id} {resolution}")
The output reads
7LWV A 3.12
7LWV B 3.12
7LWV C 3.12
6U9D A 3.19
6U9D B 3.19
6U9D C 3.19
6U9D D 3.19
6U9D E 3.19
6U9D F 3.19
6U9D G 3.19
6U9D H 3.19
6U9D I 3.19
6U9D J 3.19
6U9D K 3.19
6U9D L 3.19
6U9D M 3.19
6U9D N 3.19
6U9D O 3.19
6U9D P 3.19
6U9D Q 3.19
6U9D R 3.19
6U9D S 3.19
6U9D T 3.19
6U9D U 3.19
6U9D V 3.19
6U9D W 3.19
6U9D X 3.19
1F34 A 2.45
1F34 B 2.45
2OXP A 2.0
Upvotes: 1