Physicist
Physicist

Reputation: 3048

re-arranging entries in 2d array based on certain columns

Suppose I have an M x N numpy array where each row represents a data entry, the first N-1 columns represent different parameters (independent variable), and the last column represent the data I'm interested in (dependent variable).

What's the most elegant way to re-arrange different rows such that they are sorted by the parameters?

Example:

# original
1                        0.1                      20                       0.30000000000000004      0.07819319717404902     
1                        1                        10                       0.2                      0.07550707294415204      
2                        0.1                      0                        0                        0.07078663749666488      
2                        0.1                      0                        0.1                      0.07284943819285646      
1                        1                        15                       0.4                      0.08047398714777267      
1                        1                        15                       0.5                      0.0820402298018169      
1                        1                        15                       0.30000000000000004      0.07819319717406738     
1                        1                        20                       0                        0.07079655446543297      
1                        1                        20                       0.1                      0.07286704639139795      
1                        1                        5                        0.4                       0.086521872154



# desired:
1                        0.1                      20                       0.30000000000000004      0.07819319717404902     
1                        1                        5                        0.4                       0.086521872154
1                        1                        10                       0.2                      0.07550707294415204      
1                        1                        15                       0.30000000000000004      0.07819319717406738
1                        1                        15                       0.4                      0.08047398714777267      
1                        1                        15                       0.5                      0.0820402298018169      
1                        1                        20                       0                        0.07079655446543297      
1                        1                        20                       0.1                      0.07286704639139795      
2                        0.1                      0                        0                        0.07078663749666488      
2                        0.1                      0                        0.1                      0.07284943819285646 

I want the data to be sorted from the smallest in each parameter.

Upvotes: 2

Views: 85

Answers (3)

Massifox
Massifox

Reputation: 4487

If you want sort ndarray on single columns using np.argsort

Given the following matrix:

m = np.array([[5., 0.1, 3.4],
           [7., 0.3, 6.8],
           [3., 0.2, 5.6]])

This code sorts the matrix m based on column 0:

m[m[:,0].argsort(kind='mergesort')]

Result:

array([[3. , 0.2, 5.6],
       [5. , 0.1, 3.4],
       [7. , 0.3, 6.8]])

If you want sort ndarray on multiple columns using np.lexsort

Given:

a = np.array([[1,20,200], [1,30,100], [1,10,300]])
array([[  1,  20, 200],
       [  1,  30, 100],
       [  1,  10, 300]])

Order by column 1 and column 0:

a[np.lexsort((a[:,0],a[:,1]))]
# output:
array([[  1,  10, 300],
       [  1,  20, 200],
       [  1,  30, 100]])

NOTE: The last right-column (or row if keys is a 2D array) is the primary sort key.

Order by all columns (starting from the right):

a[np.lexsort((a[:,0], a[:,1],a[:,2]))]
# output:
array([[  1,  30, 100],
       [  1,  20, 200],
       [  1,  10, 300]])

Or equivalently, order by all column without specifying the columns manually (following the order of the columns in the matrix starting from the right):

a[np.lexsort(list(map(tuple,np.column_stack(a))))]
# output:
array([[  1,  30, 100],
       [  1,  20, 200],
       [  1,  10, 300]])

Other option: Pandas is a good idea for your specific problem?

Another option is to switch to pandas. It's works, but it is some order of magnitude slower. Here are some tests on execution times:

Benchmarck data:

a = np.array([[1,20,200]*1000, [1,30,100]*1000, [1,10,300]*1000])

Pandas version:

%%timeit
pd.DataFrame(a).sort_values(list(range(a.shape[1]))).values
# 3.66 s ± 110 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Numpy version:

%%timeit
a[np.lexsort((a[:,0], a[:,1],a[:,2]))]
# 39.6 µs ± 12.1 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

As you can see you go from the micro seconds of numpy to the seconds of the version based on pandas (about 1 million times slower).
The choice is yours :)

Upvotes: 1

Quang Hoang
Quang Hoang

Reputation: 150735

One option that uses pandas's sort_values:

pd.DataFrame(a).sort_values(list(range(a.shape[1]))).values

Output:

array([[ 1.        ,  0.1       , 20.        ,  0.3       ,  0.0781932 ],
       [ 1.        ,  1.        ,  5.        ,  0.4       ,  0.08652187],
       [ 1.        ,  1.        , 10.        ,  0.2       ,  0.07550707],
       [ 1.        ,  1.        , 15.        ,  0.3       ,  0.0781932 ],
       [ 1.        ,  1.        , 15.        ,  0.4       ,  0.08047399],
       [ 1.        ,  1.        , 15.        ,  0.5       ,  0.08204023],
       [ 1.        ,  1.        , 20.        ,  0.        ,  0.07079655],
       [ 1.        ,  1.        , 20.        ,  0.1       ,  0.07286705],
       [ 2.        ,  0.1       ,  0.        ,  0.        ,  0.07078664],
       [ 2.        ,  0.1       ,  0.        ,  0.1       ,  0.07284944]])

Upvotes: 1

Paul Panzer
Paul Panzer

Reputation: 53029

You can use lexsort:

original[np.lexsort(np.rot90(original))]

Upvotes: 1

Related Questions