Reputation: 1601
There is a 2D numpy array of about 500000 rows by 512 values each row:
[
[1,0,1,...,0,0,1], # 512 1's or 0's
[0,1,0,...,0,1,1],
...
[0,0,1,...,1,0,1], # row number 500000
]
How to sort the rows ascending as if each row is a long 512-bit integer?
[
[0,0,1,...,1,0,1],
[0,1,0,...,0,1,1],
[1,0,1,...,0,0,1],
...
]
Upvotes: 4
Views: 1379
Reputation: 102
This is slow but does the job.
def sort_col(arr, col_num=0):
# if we have sorted over all columns return array
if col_num >= arr.shape[1]:
return arr
# sort array over given column
arr_sorted = arr[arr[:, col_num].argsort()]
# if the number of 1s in the given column is not equal to the total number
# of rows neither equal to 0, split on 1 and 0, sort and then merge
if len(arr) > np.sum(arr_sorted[:, col_num]) > 0:
arr_sorted0s = sort_col(arr_sorted[arr_sorted[:, col_num]==0], col_num+1)
arr_sorted1s = sort_col(arr_sorted[arr_sorted[:, col_num]==1], col_num+1)
# change order of stacking if you want ascenting order
return np.vstack((arr_sorted0s, arr_sorted1s))
# if the number of 1s in the given column is equal to the total number
# of rows or equal to 0, just go to the next iteration
return sort_col(arr_sorted, col_num + 1)
np.random.seed(0)
a = np.random.randint(0, 2, (5, 4))
print(a)
print(sort_col(a))
# prints
[[0 1 1 0]
[1 1 1 1]
[1 1 1 0]
[0 1 0 0]
[0 0 0 1]]
[[0 0 0 1]
[0 1 0 0]
[0 1 1 0]
[1 1 1 0]
[1 1 1 1]]
Edit. Or better yet use Daniels solution. I didn't check for new answers before I posted my code.
Upvotes: 0
Reputation: 14399
Instead of converting to strings you can also use a void
view (as from @Jaime here) of the data and argsort
by that.
def sort_bin(b):
b_view = np.ascontiguousarray(b).view(np.dtype((np.void, b.dtype.itemsize * b.shape[1])))
return b[np.argsort(b_view.ravel())] #as per Divakar's suggestion
Testing
np.random.seed(0)
b = np.random.randint(0, 2, (10,5))
print(b)
print(sort_bin(b))
[[0 1 1 0 1]
[1 1 1 1 1]
[1 0 0 1 0]
...,
[1 0 1 1 0]
[0 1 0 1 1]
[1 1 1 0 1]]
[[0 0 0 0 1]
[0 1 0 1 1]
[0 1 1 0 0]
...,
[1 1 1 0 1]
[1 1 1 1 0]
[1 1 1 1 1]]
Should be much faster and less memory-intensive since b_view
is just a view into b
t = np.random.randint(0,2,(2000,512))
%timeit sort_bin(t)
100 loops, best of 3: 3.09 ms per loop
%timeit np.array([[int(i) for i in r] for r in np.sort(np.apply_along_axis(lambda r: ''.join([str(c) for c in r]), 0, t))])
1 loop, best of 3: 3.29 s per loop
About 1000x faster actually
Upvotes: 4
Reputation: 20434
string
of each row
and then applying np.sort()
So if we have an array
to test on:
a = np.array([[1,0,0,0],[0,0,0,0],[1,1,1,1],[0,0,1,1]])
We can create strings
of each row
by using np.apply_along_axis
:
a = np.apply_along_axis(lambda r: ''.join([str(c) for c in r]), 0, a)
which would make a
now:
array(['1010', '0010', '0011', '0011'], dtype='<U4')
and so now we can sort
the strings
with np.sort()
:
a = np.sort(a)
making a
:
array(['0010', '0011', '0011', '1010'], dtype='<U4')
we can then convert back to the original format with:
a = np.array([[int(i) for i in r] for r in a])
which makes a
:
array([[0, 0, 1, 0],
[0, 0, 1, 1],
[0, 0, 1, 1],
[1, 0, 1, 0]])
And if you wanted to cram this all into one line:
a = np.array([[int(i) for i in r] for r in np.sort(np.apply_along_axis(lambda r: ''.join([str(c) for c in r]), 0, a))])
Upvotes: 0
Reputation: 5582
You could sort them in a stable way 512 times, starting with the right-most bit first.
A smaller example: assume you want to sort these three 2-bit numbers by bits:
11
01
00
In the first step, you sort by the right bit, resulting in:
00
11
01
Now you sort by the first bit, in this case we have two 0s
in that column. If your sorting algorithm is not stable it would be allowed to put these equal items in any order in the result, that could cause 01
to appear before 00
which we do not want, so we use a stable sort, keeping the relative order of equal items, for the first column, resulting in the desired:
00
01
11
Upvotes: 0