Reputation: 3048

re-arranging entries in 2d array based on certain columns

Suppose I have an M x N numpy array where each row represents a data entry, the first N-1 columns represent different parameters (independent variable), and the last column represent the data I'm interested in (dependent variable).

What's the most elegant way to re-arrange different rows such that they are sorted by the parameters?

Example:

# original
1                        0.1                      20                       0.30000000000000004      0.07819319717404902     
1                        1                        10                       0.2                      0.07550707294415204      
2                        0.1                      0                        0                        0.07078663749666488      
2                        0.1                      0                        0.1                      0.07284943819285646      
1                        1                        15                       0.4                      0.08047398714777267      
1                        1                        15                       0.5                      0.0820402298018169      
1                        1                        15                       0.30000000000000004      0.07819319717406738     
1                        1                        20                       0                        0.07079655446543297      
1                        1                        20                       0.1                      0.07286704639139795      
1                        1                        5                        0.4                       0.086521872154



# desired:
1                        0.1                      20                       0.30000000000000004      0.07819319717404902     
1                        1                        5                        0.4                       0.086521872154
1                        1                        10                       0.2                      0.07550707294415204      
1                        1                        15                       0.30000000000000004      0.07819319717406738
1                        1                        15                       0.4                      0.08047398714777267      
1                        1                        15                       0.5                      0.0820402298018169      
1                        1                        20                       0                        0.07079655446543297      
1                        1                        20                       0.1                      0.07286704639139795      
2                        0.1                      0                        0                        0.07078663749666488      
2                        0.1                      0                        0.1                      0.07284943819285646

I want the data to be sorted from the smallest in each parameter.

Upvotes: 2

Answers (3)

Massifox

Reputation: 4487

If you want sort ndarray on single columns using np.argsort

Given the following matrix:

m = np.array([[5., 0.1, 3.4],
           [7., 0.3, 6.8],
           [3., 0.2, 5.6]])

This code sorts the matrix m based on column 0:

m[m[:,0].argsort(kind='mergesort')]

Result:

array([[3. , 0.2, 5.6],
       [5. , 0.1, 3.4],
       [7. , 0.3, 6.8]])

If you want sort ndarray on multiple columns using np.lexsort

Given:

a = np.array([[1,20,200], [1,30,100], [1,10,300]])
array([[  1,  20, 200],
       [  1,  30, 100],
       [  1,  10, 300]])

Order by column 1 and column 0:

a[np.lexsort((a[:,0],a[:,1]))]
# output:
array([[  1,  10, 300],
       [  1,  20, 200],
       [  1,  30, 100]])

NOTE: The last right-column (or row if keys is a 2D array) is the primary sort key.

Order by all columns (starting from the right):

a[np.lexsort((a[:,0], a[:,1],a[:,2]))]
# output:
array([[  1,  30, 100],
       [  1,  20, 200],
       [  1,  10, 300]])

Or equivalently, order by all column without specifying the columns manually (following the order of the columns in the matrix starting from the right):

a[np.lexsort(list(map(tuple,np.column_stack(a))))]
# output:
array([[  1,  30, 100],
       [  1,  20, 200],
       [  1,  10, 300]])

Other option: Pandas is a good idea for your specific problem?

Another option is to switch to pandas. It's works, but it is some order of magnitude slower. Here are some tests on execution times:

Benchmarck data:

a = np.array([[1,20,200]*1000, [1,30,100]*1000, [1,10,300]*1000])

Pandas version:

%%timeit
pd.DataFrame(a).sort_values(list(range(a.shape[1]))).values
# 3.66 s ± 110 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Numpy version:

%%timeit
a[np.lexsort((a[:,0], a[:,1],a[:,2]))]
# 39.6 µs ± 12.1 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

As you can see you go from the micro seconds of numpy to the seconds of the version based on pandas (about 1 million times slower).
The choice is yours :)

Upvotes: 1

Quang Hoang

Reputation: 150735

One option that uses pandas's sort_values:

pd.DataFrame(a).sort_values(list(range(a.shape[1]))).values

Output:

array([[ 1.        ,  0.1       , 20.        ,  0.3       ,  0.0781932 ],
       [ 1.        ,  1.        ,  5.        ,  0.4       ,  0.08652187],
       [ 1.        ,  1.        , 10.        ,  0.2       ,  0.07550707],
       [ 1.        ,  1.        , 15.        ,  0.3       ,  0.0781932 ],
       [ 1.        ,  1.        , 15.        ,  0.4       ,  0.08047399],
       [ 1.        ,  1.        , 15.        ,  0.5       ,  0.08204023],
       [ 1.        ,  1.        , 20.        ,  0.        ,  0.07079655],
       [ 1.        ,  1.        , 20.        ,  0.1       ,  0.07286705],
       [ 2.        ,  0.1       ,  0.        ,  0.        ,  0.07078664],
       [ 2.        ,  0.1       ,  0.        ,  0.1       ,  0.07284944]])

Upvotes: 1

Paul Panzer

Reputation: 53029

You can use lexsort:

original[np.lexsort(np.rot90(original))]

Upvotes: 1

re-arranging entries in 2d array based on certain columns

Answers (3)

If you want sort ndarray on single columns using np.argsort

If you want sort ndarray on multiple columns using np.lexsort

Other option: Pandas is a good idea for your specific problem?

Related Questions