grouping numpy arrays based on value similarity given the indices

Question

I have a couple of numpy arrays:

orig = [[28021.22333333,  6585.53333333,     0. ],
 [28021.22333333,  6585.53333333,     0.        ],
 [26723.52333333,  6587.48666667,     0.        ],
 [26723.52333333,  6587.48666667,     0.        ],
 [26063.11,       13089.56,           0.        ],
 [26063.11,       13089.56,           0.        ],
 [27424.91,       13091.4,            0.        ],
 [27424.91,       13091.4,            0.        ],
 [28833.60333333, 12641.65333333,     0.        ],
 [28833.60333333, 12641.65333333,     0.        ],
 [26125.33,        7954.18166667,     0.        ],
 [26125.33,        7954.18166667,     0.        ],
 [26121.29666667,  7956.72633333,     0.        ],
 [26121.29666667,  7956.72633333,     0.        ],
 [26116.26,        7957.80833333,     0.        ],
 [26116.26,        7957.80833333,     0.        ],
 [26110.98333333,  7957.263,          0.        ],
 [26110.98333333,  7957.263,          0.        ],
 [26106.27,        7955.17333333,     0.        ],
 [26106.27,        7955.17333333,     0.        ],
 [26102.84,        7951.85733333,     0.        ],
 [26102.84,        7951.85733333,     0.        ]]

and

idxs = [ 0, 1, 2, 3, 4, 5, 7, 8, 9, 10, 11, 12, 13, 14, 15, 17, 18, 20, 21]

tri = [731, 703, 703, 731, 731, 731, 731, 693, 673, 699, 689, 731, 727, 731, 731, 731, 731, 731, 730]

pnts = [[28035.61081192,  6657.82528209,  2800.  ],
 [27951.42292993,  6561.84728091,  2800.        ],
 [28076.63625815,  6536.92743701,  2800.        ],
 [28139.0775588,   6773.36600593,  2800.        ],
 [27990.76839321,  6805.17674429,  2800.        ],
 [27856.70943257,  6734.2138896,   2800.        ],
 [27799.62835447,  6593.68175023,  2800.        ],
 [27846.23402973,  6449.33687603,  2800.        ],
 [27974.71914494,  6368.71983786,  2800.        ],
 [28124.96408673,  6389.55224384,  2800.        ],
 [28226.66757706,  6502.08637967,  2800.        ],
 [28232.24142249,  6653.66627254,  2800.        ],
 [28382.4101748,  6673.10904354,  2800.        ],
 [28315.56368133,  6812.44564901,  2800.        ],
 [28197.8230677,   6912.54705367,  2800.        ],
 [28049.54675563,  6956.10481526,  2800.        ],
 [27896.37306654,  6935.58740108,  2800.        ],
 [27764.78712281,  6854.54245845,  2800.        ],
 [27677.54132953,  6726.98339422,  2800.        ]]

how to group now the values in idxs, tri and pnts based on the values of idxs which are indices to rows of orig so that they correspond to the same value per row in orig. For example I would like to get:

idxs = [[0,1], [2,3], [4,5], [7], [8,9], [10,11], [12,13], [14,15], [17], [18], [20,21]]

tri = [[731, 703], [703, 731], [731, 731], [731], [693, 673], [699, 689], [731, 727], [731, 731], [731], [731], [731, 730]]

and

pnts = [[[28035.61081192,  6657.82528209,  2800.  ],
     [27951.42292993,  6561.84728091,  2800.        ]],
     [[28076.63625815,  6536.92743701,  2800.        ],
     [28139.0775588,   6773.36600593,  2800.        ]],
     [[27990.76839321,  6805.17674429,  2800.        ],
     [27856.70943257,  6734.2138896,   2800.        ]],
     [[27799.62835447,  6593.68175023,  2800.        ]],
     [[27846.23402973,  6449.33687603,  2800.        ],
     [27974.71914494,  6368.71983786,  2800.        ]],
     [[28124.96408673,  6389.55224384,  2800.        ],
     [28226.66757706,  6502.08637967,  2800.        ]],
     [[28232.24142249,  6653.66627254,  2800.        ],
     [28382.4101748,  6673.10904354,  2800.        ]],
     [[28315.56368133,  6812.44564901,  2800.        ],
     [28197.8230677,   6912.54705367,  2800.        ]],
     [[28049.54675563,  6956.10481526,  2800.        ]],
     [[27896.37306654,  6935.58740108,  2800.        ]],
     [[27764.78712281,  6854.54245845,  2800.        ],
     [27677.54132953,  6726.98339422,  2800.        ]]]

I tried to numpy.split() but I couldn't really find the right condition to use. Also imagine that at the end I would have to apply the same on corresponding matrices with quite a few million inputs.

Ehsan · Accepted Answer

This is what you want:

import numpy_indexed as npi
eq = npi.group_by(orig[idxs])
print(eq.split(idxs))
print(eq.split(tri))
print(eq.split(pnts))

Obviously, you can sort them if you would like.

output:

#idxs
[array([0, 1]), array([20, 21]), array([8, 9]), array([14, 15]), array([2, 3]), array([17]), array([12, 13]), array([18]), array([4, 5]), array([7]), array([10, 11])]
#tri
[array([731, 703]), array([731, 730]), array([693, 673]), array([731, 731]), array([703, 731]), array([731]), array([731, 727]), array([731]), array([731, 731]), array([731]), array([699, 689])]
#pnts
[array([[28035.61081192,  6657.82528209,  2800.        ],
       [27951.42292993,  6561.84728091,  2800.        ]]), array([[27764.78712281,  6854.54245845,  2800.        ],
       [27677.54132953,  6726.98339422,  2800.        ]]), array([[27846.23402973,  6449.33687603,  2800.        ],
       [27974.71914494,  6368.71983786,  2800.        ]]), array([[28315.56368133,  6812.44564901,  2800.        ],
       [28197.8230677 ,  6912.54705367,  2800.        ]]), array([[28076.63625815,  6536.92743701,  2800.        ],
       [28139.0775588 ,  6773.36600593,  2800.        ]]), array([[28049.54675563,  6956.10481526,  2800.        ]]), array([[28232.24142249,  6653.66627254,  2800.        ],
       [28382.4101748 ,  6673.10904354,  2800.        ]]), array([[27896.37306654,  6935.58740108,  2800.        ]]), array([[27990.76839321,  6805.17674429,  2800.        ],
       [27856.70943257,  6734.2138896 ,  2800.        ]]), array([[27799.62835447,  6593.68175023,  2800.        ]]), array([[28124.96408673,  6389.55224384,  2800.        ],
       [28226.66757706,  6502.08637967,  2800.        ]])]

And if you want to convert them to lists (Note that numpy does not accept non-rectangular arrays like the ones above):

print(sorted([l.tolist() for l in eq.split(idxs)]))

output:

[[0, 1], [2, 3], [4, 5], [7], [8, 9], [10, 11], [12, 13], [14, 15], [17], [18], [20, 21]]

grouping numpy arrays based on value similarity given the indices

Answers (1)

Related Questions