Reputation: 1156

Computing mean for non-unique elements of numpy array pairs

I have three arrays, all of the same size:

arr1 = np.array([1.4, 3.0, 4.0, 4.0, 7.0, 9.0, 9.0, 9.0])
arr2 = np.array([2.3, 5.0, 2.3, 2.3, 4.0, 6.0, 5.0, 6.0])
data = np.array([5.4, 7.1, 9.5, 1.9, 8.7, 1.8, 6.1, 7.4])

arr1 can take up any float value and arr2 only a few float values. I want to obtain the unique pairs of arr1 and arr2, e.g.

arr1unique = np.array([1.4, 3.0, 4.0, 7.0, 9.0, 9.0])
arr2unique = np.array([2.3, 5.0, 2.3, 4.0, 6.0, 5.0])

For each non-unique pair I need to average the corresponding elements in the data-array, e.g. averaging the values 9.5 and 1.9 since the pair (arr1[3], arr2[3]) and (arr1[4], arr2[4]) are equal. The same holds for the values in data corresponding to the indices 6 and 8. The data array therefore becomes

dataunique = np.array([5.4, 7.1, 5.7, 8.7, 4.6, 6.1])

Upvotes: 0

Answers (4)

Eelco Hoogendoorn

Reputation: 10759

Here is a 'pure numpy' solution to the problem. Pure numpy in quotes because it relies on a numpy enhancement proposal which I am still working on, but you can find the full code here:

http://pastebin.com/c5WLWPbp

group_by((arr1, arr2)).mean(data)

Voila, problem solved. Way faster than any of the posted solutions; and much more elegant too, if I may say so myself ;).

Upvotes: 1

Abhijit

Reputation: 63737

All you have to is to create a OrderedDict to store the keys as pair of elements in (arr1,arr2) and the values as a list of elements in data. For any duplicate key (pair of arr1 and arr2), the duplicate entries would be stored in the list. You can then re-traverse the values in the dictionary and create the average. To get the unique keys, just iterate over the keys and split the tuples

Try the following

>>> d=collections.OrderedDict()
>>> for k1,k2,v in zip(arr1,arr2,data):
    d.setdefault((k1,k2),[]).append(v)      
>>> np.array([np.mean(v) for v in d.values()])
array([ 5.4,  7.1,  5.7,  8.7,  4.6,  6.1])

>>> arr1unique = np.array([e[0] for e in d])
>>> arr2unique = np.array([e[1] for e in d])

Upvotes: 0

wim

Reputation: 362786

defaultdict can help you here:

>>> import numpy as np
>>> arr1 = np.array([1.4, 3.0, 4.0, 4.0, 7.0, 9.0, 9.0, 9.0])
>>> arr2 = np.array([2.3, 5.0, 2.3, 2.3, 4.0, 6.0, 5.0, 6.0])
>>> data = np.array([5.4, 7.1, 9.5, 1.9, 8.7, 1.8, 6.1, 7.4])
>>> from collections import defaultdict
>>> dd = defaultdict(list)
>>> for x1, x2, d in zip(arr1, arr2, data):
...   dd[x1, x2].append(d)
... 
>>> arr1unique = np.array([x[0] for x in dd.iterkeys()])
>>> arr2unique = np.array([x[1] for x in dd.iterkeys()])
>>> dataunique = np.array([np.mean(x) for x in dd.itervalues()])
>>> print arr1unique
[ 1.4  7.   4.   9.   9.   3. ]
>>> print arr2unique
[ 2.3  4.   2.3  5.   6.   5. ]
>>> print dataunique
[ 5.4  8.7  5.7  6.1  4.6  7.1]

This method gives your answer, but destroys the ordering. If the ordering is important, you can do basically the same thing with collections.OrderedDict

Upvotes: 0

Vivek S

Reputation: 5540

Make a dictionary from arr1 as key and store its equivalent arr2 as value.for each save to dictionary generate its dataunique entry.If key already exists skip that iteration and continue.

Upvotes: 0

Computing mean for non-unique elements of numpy array pairs

Answers (4)

Related Questions