Combining multiple queries with numpy masks

Question

what i am trying to do is plot two rows out of a file looking like this:

number          pair        atom       count         shift      error
 1            ALA ALA       CA         7624           1.35           0.13
 1            ALA ALA       HA         7494          19.67          11.44
38            ARG LYS       CA         3395          35.32           9.52
38            ARG LYS       HA         3217           1.19           0.38
38            ARG LYS       CB         3061           0.54           1.47
39            ARG MET       CA         1115          35.62          13.08
39            ARG MET       HA         1018           1.93           0.20
39            ARG MET       CB          976           1.80           0.34

What i want to do is to plot the rows that contain atom CA and CB using their atomvalues. so basically i want to do :

atomtypemask_ca = data['atom'] == 'CA'
xaxis = np.array(data['shift'][atomtypemask_ca])
aa, atom = data['aa'][atomtypemask_ca], data['atom'][atomtypemask_ca]

atomtypemask_cb = data['atom'] == 'CB'
yaxis = np.array(data['shift'][atomtypemask_cb])

plot (xaxis, yaxis)

what is kind of ruining my day is the reason that some values don't have a CB entry. How can i plot this kind of thing, ignoring entries that have only one of the two atomvalues set? I can of course program it, but i think this should be possible using masks, therefore producing cleaner code.

Avaris · Accepted Answer

I'm guessing, first column is the residue number. Use that. I don't know your data structure or what shift refers to, but you should be able to do something like this:

In : residues
Out: array([ 1,  1, 38, 38, 38, 39, 39, 39])

In : atom
Out: 
array(['CA', 'HA', 'CA', 'HA', 'CB', 'CA', 'HA', 'CB'], 
      dtype='|S2')

In : shift
Out: array([7624, 7494, 3395, 3217, 3061, 1115, 1018,  976])

# rows with name 'CB'
In : cb = atom=='CB'

# rows with name 'CA' _and_ residues same as 'CB'
In : ca = numpy.logical_and(numpy.in1d(residues, residues[cb]), atom=='CA')
# or if in1d is not available
# ca = numpy.logical_and([(residue in residues[cb]) for residue in residues], atom=='CA')

In : shift[ca]
Out: array([3395, 1115])

In : shift[cb]
Out: array([3061,  976])

Combining multiple queries with numpy masks

Answers (1)

Related Questions