Reputation: 1156
I have three arrays, all of the same size:
arr1 = np.array([1.4, 3.0, 4.0, 4.0, 7.0, 9.0, 9.0, 9.0])
arr2 = np.array([2.3, 5.0, 2.3, 2.3, 4.0, 6.0, 5.0, 6.0])
data = np.array([5.4, 7.1, 9.5, 1.9, 8.7, 1.8, 6.1, 7.4])
arr1 can take up any float value and arr2 only a few float values. I want to obtain the unique pairs of arr1 and arr2, e.g.
arr1unique = np.array([1.4, 3.0, 4.0, 7.0, 9.0, 9.0])
arr2unique = np.array([2.3, 5.0, 2.3, 4.0, 6.0, 5.0])
For each non-unique pair I need to average the corresponding elements in the data
-array, e.g. averaging the values 9.5 and 1.9 since the pair (arr1[3], arr2[3])
and (arr1[4], arr2[4])
are equal. The same holds for the values in data corresponding to the indices 6 and 8. The data array therefore becomes
dataunique = np.array([5.4, 7.1, 5.7, 8.7, 4.6, 6.1])
Upvotes: 0
Views: 573
Reputation: 10759
Here is a 'pure numpy' solution to the problem. Pure numpy in quotes because it relies on a numpy enhancement proposal which I am still working on, but you can find the full code here:
group_by((arr1, arr2)).mean(data)
Voila, problem solved. Way faster than any of the posted solutions; and much more elegant too, if I may say so myself ;).
Upvotes: 1
Reputation: 63737
All you have to is to create a OrderedDict
to store the keys as pair of elements in (arr1,arr2) and the values as a list of elements in data. For any duplicate key (pair of arr1 and arr2), the duplicate entries would be stored in the list. You can then re-traverse the values in the dictionary and create the average. To get the unique keys, just iterate over the keys and split the tuples
Try the following
>>> d=collections.OrderedDict()
>>> for k1,k2,v in zip(arr1,arr2,data):
d.setdefault((k1,k2),[]).append(v)
>>> np.array([np.mean(v) for v in d.values()])
array([ 5.4, 7.1, 5.7, 8.7, 4.6, 6.1])
>>> arr1unique = np.array([e[0] for e in d])
>>> arr2unique = np.array([e[1] for e in d])
Upvotes: 0
Reputation: 362786
defaultdict
can help you here:
>>> import numpy as np
>>> arr1 = np.array([1.4, 3.0, 4.0, 4.0, 7.0, 9.0, 9.0, 9.0])
>>> arr2 = np.array([2.3, 5.0, 2.3, 2.3, 4.0, 6.0, 5.0, 6.0])
>>> data = np.array([5.4, 7.1, 9.5, 1.9, 8.7, 1.8, 6.1, 7.4])
>>> from collections import defaultdict
>>> dd = defaultdict(list)
>>> for x1, x2, d in zip(arr1, arr2, data):
... dd[x1, x2].append(d)
...
>>> arr1unique = np.array([x[0] for x in dd.iterkeys()])
>>> arr2unique = np.array([x[1] for x in dd.iterkeys()])
>>> dataunique = np.array([np.mean(x) for x in dd.itervalues()])
>>> print arr1unique
[ 1.4 7. 4. 9. 9. 3. ]
>>> print arr2unique
[ 2.3 4. 2.3 5. 6. 5. ]
>>> print dataunique
[ 5.4 8.7 5.7 6.1 4.6 7.1]
This method gives your answer, but destroys the ordering. If the ordering is important, you can do basically the same thing with collections.OrderedDict
Upvotes: 0
Reputation: 5540
Make a dictionary from arr1 as key and store its equivalent arr2 as value.for each save to dictionary generate its dataunique entry.If key already exists skip that iteration and continue.
Upvotes: 0