Reputation: 289

Extract Columns from a Protein Data Bank (PDB) Text File

I want to make a plot with Matplotlib in Python and therefore read some data from a PDB-file (protein data bank). I want to extract every column from the file and store these columns in separate vectors. The PDB-file consists of columns with both text and floats. I'm very new to Matplotlib and I have tried several methods suggested to extract these columns, but nothing seem to work. What would be the best way to extract these columns? I'm going to load a lot of data in a later stage, so it's good if the method isn't too inefficient.

The PDB-files looks something like this:

ATOM      1  CA  MET A   1      38.012   8.932  -1.253
ATOM      2  CA  GLU A   2      39.809   5.652  -1.702
ATOM      3  CA  ALA A   3      43.007   5.013   0.368
ATOM      4  CA  ALA A   4      41.646   7.577   2.820
ATOM      5  CA  HIS A   5      42.611   4.898   5.481
ATOM      6  CA  SER A   6      46.191   5.923   5.090
ATOM      7  CA  LYS A   7      45.664   9.815   5.134
ATOM      8  CA  SER A   8      45.898  12.022   8.181
ATOM      9  CA  THR A   9      42.528  13.075   9.570
ATOM     10  CA  GLU A  10      43.330  16.633   8.378
ATOM     11  CA  GLU A  11      44.171  15.729   4.757
ATOM     12  CA  CYS A  12      40.589  14.150   4.745
ATOM     13  CA  LEU A  13      38.984  17.314   6.105
ATOM     14  CA  ALA A  14      40.633  19.053   3.220
ATOM     15  CA  TYR A  15      39.740  16.682   0.505
ATOM     16  CA  PHE A  16      36.138  17.421   1.566
ATOM     17  CA  GLY A  17      36.536  20.854   2.826
ATOM     18  CA  VAL A  18      34.184  20.012   5.553
ATOM     19  CA  SER A  19      34.483  20.966   9.177

Upvotes: 1

Answers (3)

Pranav Khade

Reputation: 13

This tutorial might help: https://py-packman.readthedocs.io/en/latest/tutorials/molecule.html#tutorials-molecule

from packman import molecule

Protein = molecule.load_structure('/path/to/PDB/file.pdb')
#molecule.download_structure('1prw','1prw.pdb') if you want to download PDB file 1prw.pdb


for i in Protein[0].get_atoms():
    #Iterating over atom objects (parent= residue)
    print(i.get_name(), i.get_id(), i.get_location(), i.get_parent().get_name())

Provided above are way to get name of the atoms ie.. i.get_name(), id of the atoms ie.. i.get_id() etc.

It is possible to extract all the components of the PDB file. Please read the PACKMAN documentation for the details.

Disclosure: Author of the package py-packman

Upvotes: 0

sabrinawang

Reputation: 11

The Protein Data Bank (pdb) file format is a textual file format describing the three-dimensional structures of molecules held in the Protein Data Bank. The pdb format accordingly provides for description and annotation of protein and nucleic acid structures including atomic coordinates, observed sidechain rotamers , secondary structure assignments, as well as atomic connectivity.I find this on google.

As for extracting column, you also can find the answer on google or wiki.

Upvotes: 1

leekaiinthesky

Reputation: 5593

Going off of @Kyle_S-C's recommendation, here's a way to do it using Biopython.

First read your file into a Biopython Structure object:

import Bio.PDB
path = '/path/to/PDB/file' # your file path here
p = Bio.PDB.PDBParser()
structure = p.get_structure('myStructureName', path)

Then, for example, you can get a list of just the Atom ids like this:

ids = [a.get_id() for a in structure.get_atoms()]

See the Biopython Structural Bioinformatics FAQ for more, including the following methods for accessing the PDB columns for an Atom:

How do I extract information from an Atom object?

Using the following methods:

# a.get_name()           # atom name (spaces stripped, e.g. 'CA')
# a.get_id()             # id (equals atom name)
# a.get_coord()          # atomic coordinates
# a.get_vector()         # atomic coordinates as Vector object
# a.get_bfactor()        # isotropic B factor
# a.get_occupancy()      # occupancy
# a.get_altloc()         # alternative location specifier
# a.get_sigatm()         # std. dev. of atomic parameters
# a.get_siguij()         # std. dev. of anisotropic B factor
# a.get_anisou()         # anisotropic B factor
# a.get_fullname()       # atom name (with spaces, e.g. '.CA.')

Upvotes: 0

Extract Columns from a Protein Data Bank (PDB) Text File

Answers (3)

Related Questions