Reputation: 35321
I want to check that a sequence of N numpy vectors of integers is lexicographically ordered. All the vectors in the sequence have shape 1 × 2. (The value of N is big, so I want to avoid sorting this sequence if it is already sorted.)
Does Python, or numpy, already offer a predicate to perform such a test?
(It would not be hard to roll my own, but I prefer to use built-in tools if they exist.)
Upvotes: 0
Views: 337
Reputation: 69242
numpy has lexsort, but this does a sort, not a test of whether the data is sorted. None-the-less, running it on sorted data is about twice as fast as unsorted.
import numpy as np
import timeit
def data(N):
return np.random.randint(0,10,(N,2))
def get_sorted(x):
return x[np.lexsort(x.T)]
x = data(5)
y = get_sorted(x)
print x # to verify lex sorting
print
print y
print
x = data(1000)
y = get_sorted(x)
# to test the time for sorted vs unsorted data
print timeit.timeit("np.lexsort(x.T)", "from __main__ import np, x", number=1000)
print timeit.timeit("np.lexsort(y.T)", "from __main__ import np, y", number=1000)
And here are the results:
[[6 7] # unsorted
[4 3]
[6 7]
[9 2]
[7 3]]
[[9 2] # sorted by the second column first
[4 3]
[7 3]
[6 7]
[6 7]]
0.0788 # time to lex sort 1000x2 unsorted data values
0.0381 # time to lex sort 1000x2 pre-sorted data values
Note also that the speed of python vs numpy will depend on the list, because python can sometime short-circuit its tests. So if you think that your list will generally be unsorted, a pure python solution could figure this out in the first few values, which could be much faster; whereas numpy solutions will generally work the the entire array.
Upvotes: 1
Reputation: 231540
To have an issorted
predicate you need a well defined sort, or at least a clear method of comparing items.
To follow on my question about the nature of your data. It sounds as though you have something like this:
In [130]: x=[[1,3],[3,4],[1,2],[3,1],[0,2],[6,5]]
In [131]: x1=[np.array(i).reshape(1,2) for i in x]
In [132]: x1
Out[132]:
[array([[1, 3]]),
array([[3, 4]]),
array([[1, 2]]),
array([[3, 1]]),
array([[0, 2]]),
array([[6, 5]])]
The Python sort is lexographic - that is, sort of the 1st element on the sublists, and then on the 2nd element.
In [137]: sorted(x)
Out[137]: [[0, 2], [1, 2], [1, 3], [3, 1], [3, 4], [6, 5]]
numpy
sorts don't preserve the pairs - depending on the axis
specification it sorts by column, or row (or flat). But the np.sort
doc does say that complex numbers are sorted lexographically:
In [157]: xj = np.dot(x,[1,1j])
In [158]: xj
Out[158]: array([ 1.+3.j, 3.+4.j, 1.+2.j, 3.+1.j, 0.+2.j, 6.+5.j])
In [159]: np.sort(xj)
Out[159]: array([ 0.+2.j, 1.+2.j, 1.+3.j, 3.+1.j, 3.+4.j, 6.+5.j])
This matches the Python list sort.
If my guess as to your data type is correct, a comparison based test would use something like:
In [167]: [i.__lt__(j) for i,j in zip(x[:-1],x[1:])]
Out[167]: [True, False, True, False, True]
In [168]: xs=sorted(x)
In [169]: [i.__lt__(j) for i,j in zip(xs[:-1],xs[1:])]
Out[169]: [True, True, True, True, True]
That also works for the complex array:
In [173]: xjs=np.sort(xj)
In [174]: [i.__lt__(j) for i,j in zip(xjs[:-1],xjs[1:])]
Out[174]: [True, True, True, True, True]
For large lists I'd try one of the itertools
for short circuiting iteration.
But when applied to the (plain) array, it is clear that the question of whether it is sorted or not needs further specification.
In [172]: [i.__lt__(j) for i,j in zip(x1[:-1],x1[1:])]
Out[172]:
[array([[ True, True]], dtype=bool),
array([[False, False]], dtype=bool),
array([[ True, False]], dtype=bool),
array([[False, True]], dtype=bool),
array([[ True, True]], dtype=bool)]
By the way, a list of (2,1) arrays would look something like this:
[np.array(i).reshape(1,2) for i in x]
[array([[1, 3]]),
array([[3, 4]]),
array([[1, 2]]),
array([[3, 1]]),
array([[0, 2]]),
array([[6, 5]])]
which if turned into an array would have a (6,1,2)
shape. Or did you want a (6,2)
array?
In [179]: np.array(x)
Out[179]:
array([[1, 3],
[3, 4],
[1, 2],
[3, 1],
[0, 2],
[6, 5]])
Upvotes: 1
Reputation: 14878
You can use np.diff and np.any:
A = np.array([[1,2,3], [2,3,1], [3, 4, 5]])
diff = np.diff(A, axis=0)
print np.all(diff>=0, axis=0)
Upvotes: 1