Christian Opitz
Christian Opitz

Reputation: 23

Numpy: Efficient access to sub-arrays generated by numpy.split

I have the following code that generates a list of sub-arrays based on the split function. Here, I just compare the first value of each tuple and based on the difference I generate the sub-arrays. So far so good.

import numpy as np

f = np.genfromtxt("d_n_isogro_ms.txt", names=True, dtype=None, usecols=(1,-1))

dm  = np.absolute(np.diff(f['mz']))
pos = np.where(dm > 2)[0] + 1
fsplit = np.array_split(f, pos)

This is how the sample input looks like (only an excerpt):

[(270.0332, 472) (271.0376, 1936) (272.0443, 11188) (273.0495, 65874)
 (274.0517, 8582) (275.0485, 4081) (276.0523, 659) (286.058, 1078)
 (287.0624, 4927) (288.0696, 22481) (289.0757, 84001) (290.078, 13688)
 (291.0746, 5402) (430.1533, 13995) (431.1577, 2992) (432.1685, 504)]
<type 'numpy.ndarray'>

The position for this particular data is then computed as:

pos = [7,12]

And here is my sample output:

[array([(270.0332, 472), (271.0376, 1936), (272.0443, 11188),
       (273.0495, 65874), (274.0517, 8582), (275.0485, 4081),
       (276.0523, 659)], dtype=[('mz', '<f8'), ('I', '<i8')]),
array([(286.058, 1078), (287.0624, 4927), (288.0696, 22481),
   (289.0757, 84001), (290.078, 13688), (291.0746, 5402)], 
  dtype=[('mz', '<f8'), ('I', '<i8')]),
array([(430.1533, 13995),
   (431.1577, 2992), (432.1685, 504)], 
  dtype=[('mz', '<f8'), ('I', '<i8')])]

I would like to perform the weighted average on each of the arrays. Is there an efficient way of doing this with numpy? I basically fail with the indexing. Preferably, I would like to use the dtype to identify weights and numbers.

Maybe one could do the whole operation on the fly

Thank you very much for your help in advance.

Best, Christian

Upvotes: 2

Views: 167

Answers (1)

user2379410
user2379410

Reputation:

The output of np.array_split is a Python list containing arrays of unequal lenghts. The best you can do in that case is a Python loop:

result = [np.average(f_i['mz'], weights=f_i['I']) for f_i in fsplit]

But it is possible to come up with a completely vectorized solution, by using add.reduceat instead of array_split:

dm = np.abs(np.diff(f['mz']))
pos = np.flatnonzero(np.r_[True, dm > 2])

totals = np.add.reduceat(f['mz']*f['I'], pos)
counts = np.add.reduceat(f['I'], pos)
result = totals / counts

Upvotes: 2

Related Questions