Reputation: 23
I have the following code that generates a list of sub-arrays based on the split function. Here, I just compare the first value of each tuple and based on the difference I generate the sub-arrays. So far so good.
import numpy as np
f = np.genfromtxt("d_n_isogro_ms.txt", names=True, dtype=None, usecols=(1,-1))
dm = np.absolute(np.diff(f['mz']))
pos = np.where(dm > 2)[0] + 1
fsplit = np.array_split(f, pos)
This is how the sample input looks like (only an excerpt):
[(270.0332, 472) (271.0376, 1936) (272.0443, 11188) (273.0495, 65874)
(274.0517, 8582) (275.0485, 4081) (276.0523, 659) (286.058, 1078)
(287.0624, 4927) (288.0696, 22481) (289.0757, 84001) (290.078, 13688)
(291.0746, 5402) (430.1533, 13995) (431.1577, 2992) (432.1685, 504)]
<type 'numpy.ndarray'>
The position for this particular data is then computed as:
pos = [7,12]
And here is my sample output:
[array([(270.0332, 472), (271.0376, 1936), (272.0443, 11188),
(273.0495, 65874), (274.0517, 8582), (275.0485, 4081),
(276.0523, 659)], dtype=[('mz', '<f8'), ('I', '<i8')]),
array([(286.058, 1078), (287.0624, 4927), (288.0696, 22481),
(289.0757, 84001), (290.078, 13688), (291.0746, 5402)],
dtype=[('mz', '<f8'), ('I', '<i8')]),
array([(430.1533, 13995),
(431.1577, 2992), (432.1685, 504)],
dtype=[('mz', '<f8'), ('I', '<i8')])]
I would like to perform the weighted average on each of the arrays. Is there an efficient way of doing this with numpy? I basically fail with the indexing. Preferably, I would like to use the dtype to identify weights and numbers.
Maybe one could do the whole operation on the fly
Thank you very much for your help in advance.
Best, Christian
Upvotes: 2
Views: 167
Reputation:
The output of np.array_split
is a Python list containing arrays of unequal lenghts. The best you can do in that case is a Python loop:
result = [np.average(f_i['mz'], weights=f_i['I']) for f_i in fsplit]
But it is possible to come up with a completely vectorized solution, by using add.reduceat
instead of array_split
:
dm = np.abs(np.diff(f['mz']))
pos = np.flatnonzero(np.r_[True, dm > 2])
totals = np.add.reduceat(f['mz']*f['I'], pos)
counts = np.add.reduceat(f['I'], pos)
result = totals / counts
Upvotes: 2