Reputation: 289
I want to make a plot with Matplotlib in Python and therefore read some data from a PDB-file (protein data bank). I want to extract every column from the file and store these columns in separate vectors. The PDB-file consists of columns with both text and floats. I'm very new to Matplotlib and I have tried several methods suggested to extract these columns, but nothing seem to work. What would be the best way to extract these columns? I'm going to load a lot of data in a later stage, so it's good if the method isn't too inefficient.
The PDB-files looks something like this:
ATOM 1 CA MET A 1 38.012 8.932 -1.253
ATOM 2 CA GLU A 2 39.809 5.652 -1.702
ATOM 3 CA ALA A 3 43.007 5.013 0.368
ATOM 4 CA ALA A 4 41.646 7.577 2.820
ATOM 5 CA HIS A 5 42.611 4.898 5.481
ATOM 6 CA SER A 6 46.191 5.923 5.090
ATOM 7 CA LYS A 7 45.664 9.815 5.134
ATOM 8 CA SER A 8 45.898 12.022 8.181
ATOM 9 CA THR A 9 42.528 13.075 9.570
ATOM 10 CA GLU A 10 43.330 16.633 8.378
ATOM 11 CA GLU A 11 44.171 15.729 4.757
ATOM 12 CA CYS A 12 40.589 14.150 4.745
ATOM 13 CA LEU A 13 38.984 17.314 6.105
ATOM 14 CA ALA A 14 40.633 19.053 3.220
ATOM 15 CA TYR A 15 39.740 16.682 0.505
ATOM 16 CA PHE A 16 36.138 17.421 1.566
ATOM 17 CA GLY A 17 36.536 20.854 2.826
ATOM 18 CA VAL A 18 34.184 20.012 5.553
ATOM 19 CA SER A 19 34.483 20.966 9.177
Upvotes: 1
Views: 3182
Reputation: 13
This tutorial might help: https://py-packman.readthedocs.io/en/latest/tutorials/molecule.html#tutorials-molecule
from packman import molecule
Protein = molecule.load_structure('/path/to/PDB/file.pdb')
#molecule.download_structure('1prw','1prw.pdb') if you want to download PDB file 1prw.pdb
for i in Protein[0].get_atoms():
#Iterating over atom objects (parent= residue)
print(i.get_name(), i.get_id(), i.get_location(), i.get_parent().get_name())
Provided above are way to get name of the atoms ie.. i.get_name(), id of the atoms ie.. i.get_id() etc.
It is possible to extract all the components of the PDB file. Please read the PACKMAN documentation for the details.
Disclosure: Author of the package py-packman
Upvotes: 0
Reputation: 11
The Protein Data Bank (pdb) file format is a textual file format describing the three-dimensional structures of molecules held in the Protein Data Bank. The pdb format accordingly provides for description and annotation of protein and nucleic acid structures including atomic coordinates, observed sidechain rotamers , secondary structure assignments, as well as atomic connectivity.I find this on google.
As for extracting column, you also can find the answer on google or wiki.
Upvotes: 1
Reputation: 5593
Going off of @Kyle_S-C's recommendation, here's a way to do it using Biopython.
First read your file into a Biopython Structure
object:
import Bio.PDB
path = '/path/to/PDB/file' # your file path here
p = Bio.PDB.PDBParser()
structure = p.get_structure('myStructureName', path)
Then, for example, you can get a list of just the Atom ids like this:
ids = [a.get_id() for a in structure.get_atoms()]
See the Biopython Structural Bioinformatics FAQ for more, including the following methods for accessing the PDB columns for an Atom:
How do I extract information from an Atom object?
Using the following methods:
# a.get_name() # atom name (spaces stripped, e.g. 'CA') # a.get_id() # id (equals atom name) # a.get_coord() # atomic coordinates # a.get_vector() # atomic coordinates as Vector object # a.get_bfactor() # isotropic B factor # a.get_occupancy() # occupancy # a.get_altloc() # alternative location specifier # a.get_sigatm() # std. dev. of atomic parameters # a.get_siguij() # std. dev. of anisotropic B factor # a.get_anisou() # anisotropic B factor # a.get_fullname() # atom name (with spaces, e.g. '.CA.')
Upvotes: 0