Reputation: 1470
Let's consider a 2d-array A
2 3 5 7
2 3 5 7
1 7 1 4
5 8 6 0
2 3 5 7
The first, second and last lines are identical. The algorithm I'm looking for should return the number of identical rows for each different row (=number of duplicates of each element). If the script can be easily modified to also count the number of identical column also, it would be great.
I use an inefficient naive algorithm to do that:
import numpy
A=numpy.array([[2, 3, 5, 7],[2, 3, 5, 7],[1, 7, 1, 4],[5, 8, 6, 0],[2, 3, 5, 7]])
i=0
end = len(A)
while i<end:
print i,
j=i+1
numberID = 1
while j<end:
print j
if numpy.array_equal(A[i,:] ,A[j,:]):
numberID+=1
j+=1
i+=1
print A, len(A)
Expected result:
array([3,1,1]) # number identical arrays per line
My algo looks like using native python within numpy, thus inefficient. Thanks for help.
Upvotes: 3
Views: 518
Reputation: 67427
In unumpy >= 1.9.0, np.unique
has a return_counts
keyword argument you can combine with the solution here to get the counts:
b = np.ascontiguousarray(A).view(np.dtype((np.void, A.dtype.itemsize * A.shape[1])))
unq_a, unq_cnt = np.unique(b, return_counts=True)
unq_a = unq_a.view(A.dtype).reshape(-1, A.shape[1])
>>> unq_a
array([[1, 7, 1, 4],
[2, 3, 5, 7],
[5, 8, 6, 0]])
>>> unq_cnt
array([1, 3, 1])
In an older numpy, you can replicate what np.unique
does, which would look something like:
a_view = np.array(A, copy=True)
a_view = a_view.view(np.dtype((np.void,
a_view.dtype.itemsize*a_view.shape[1]))).ravel()
a_view.sort()
a_flag = np.concatenate(([True], a_view[1:] != a_view[:-1]))
a_unq = A[a_flag]
a_idx = np.concatenate(np.nonzero(a_flag) + ([a_view.size],))
a_cnt = np.diff(a_idx)
>>> a_unq
array([[1, 7, 1, 4],
[2, 3, 5, 7],
[5, 8, 6, 0]])
>>> a_cnt
array([1, 3, 1])
Upvotes: 3
Reputation: 984
You can lexsort on the row entries, which will give you the indices for traversing the rows in sorted order, making the search O(n) rather than O(n^2). Note that by default, the elements in the last column sort last, i.e. the rows are 'alphabetized' right to left rather than left to right.
In [9]: a
Out[9]:
array([[2, 3, 5, 7],
[2, 3, 5, 7],
[1, 7, 1, 4],
[5, 8, 6, 0],
[2, 3, 5, 7]])
In [10]: lexsort(a.T)
Out[10]: array([3, 2, 0, 1, 4])
In [11]: a[lexsort(a.T)]
Out[11]:
array([[5, 8, 6, 0],
[1, 7, 1, 4],
[2, 3, 5, 7],
[2, 3, 5, 7],
[2, 3, 5, 7]])
Upvotes: 2
Reputation: 1439
You can use Counter
class from collections
module for this.
It works like this :
x = [2, 2, 1, 5, 2]
from collections import Counter
c=Counter(x)
print c
Output : Counter({2: 3, 1: 1, 5: 1})
Only issue you will face is in your case since every value of x is itself a list which is a non hashable data structure. If you can convert every value of x in a tuple that it should works as :
x = [(2, 3, 5, 7),(2, 3, 5, 7),(1, 7, 1, 4),(5, 8, 6, 0),(2, 3, 5, 7)]
from collections import Counter
c=Counter(x)
print c
Output : Counter({(2, 3, 5, 7): 3, (5, 8, 6, 0): 1, (1, 7, 1, 4): 1})
Upvotes: 1