ttsesm
ttsesm

Reputation: 947

grouping numpy arrays based on value similarity given the indices

I have a couple of numpy arrays:

orig = [[28021.22333333,  6585.53333333,     0. ],
 [28021.22333333,  6585.53333333,     0.        ],
 [26723.52333333,  6587.48666667,     0.        ],
 [26723.52333333,  6587.48666667,     0.        ],
 [26063.11,       13089.56,           0.        ],
 [26063.11,       13089.56,           0.        ],
 [27424.91,       13091.4,            0.        ],
 [27424.91,       13091.4,            0.        ],
 [28833.60333333, 12641.65333333,     0.        ],
 [28833.60333333, 12641.65333333,     0.        ],
 [26125.33,        7954.18166667,     0.        ],
 [26125.33,        7954.18166667,     0.        ],
 [26121.29666667,  7956.72633333,     0.        ],
 [26121.29666667,  7956.72633333,     0.        ],
 [26116.26,        7957.80833333,     0.        ],
 [26116.26,        7957.80833333,     0.        ],
 [26110.98333333,  7957.263,          0.        ],
 [26110.98333333,  7957.263,          0.        ],
 [26106.27,        7955.17333333,     0.        ],
 [26106.27,        7955.17333333,     0.        ],
 [26102.84,        7951.85733333,     0.        ],
 [26102.84,        7951.85733333,     0.        ]]

and

idxs = [ 0, 1, 2, 3, 4, 5, 7, 8, 9, 10, 11, 12, 13, 14, 15, 17, 18, 20, 21]

tri = [731, 703, 703, 731, 731, 731, 731, 693, 673, 699, 689, 731, 727, 731, 731, 731, 731, 731, 730]

pnts = [[28035.61081192,  6657.82528209,  2800.  ],
 [27951.42292993,  6561.84728091,  2800.        ],
 [28076.63625815,  6536.92743701,  2800.        ],
 [28139.0775588,   6773.36600593,  2800.        ],
 [27990.76839321,  6805.17674429,  2800.        ],
 [27856.70943257,  6734.2138896,   2800.        ],
 [27799.62835447,  6593.68175023,  2800.        ],
 [27846.23402973,  6449.33687603,  2800.        ],
 [27974.71914494,  6368.71983786,  2800.        ],
 [28124.96408673,  6389.55224384,  2800.        ],
 [28226.66757706,  6502.08637967,  2800.        ],
 [28232.24142249,  6653.66627254,  2800.        ],
 [28382.4101748,  6673.10904354,  2800.        ],
 [28315.56368133,  6812.44564901,  2800.        ],
 [28197.8230677,   6912.54705367,  2800.        ],
 [28049.54675563,  6956.10481526,  2800.        ],
 [27896.37306654,  6935.58740108,  2800.        ],
 [27764.78712281,  6854.54245845,  2800.        ],
 [27677.54132953,  6726.98339422,  2800.        ]]

how to group now the values in idxs, tri and pnts based on the values of idxs which are indices to rows of orig so that they correspond to the same value per row in orig. For example I would like to get:

idxs = [[0,1], [2,3], [4,5], [7], [8,9], [10,11], [12,13], [14,15], [17], [18], [20,21]]

tri = [[731, 703], [703, 731], [731, 731], [731], [693, 673], [699, 689], [731, 727], [731, 731], [731], [731], [731, 730]]

and

pnts = [[[28035.61081192,  6657.82528209,  2800.  ],
     [27951.42292993,  6561.84728091,  2800.        ]],
     [[28076.63625815,  6536.92743701,  2800.        ],
     [28139.0775588,   6773.36600593,  2800.        ]],
     [[27990.76839321,  6805.17674429,  2800.        ],
     [27856.70943257,  6734.2138896,   2800.        ]],
     [[27799.62835447,  6593.68175023,  2800.        ]],
     [[27846.23402973,  6449.33687603,  2800.        ],
     [27974.71914494,  6368.71983786,  2800.        ]],
     [[28124.96408673,  6389.55224384,  2800.        ],
     [28226.66757706,  6502.08637967,  2800.        ]],
     [[28232.24142249,  6653.66627254,  2800.        ],
     [28382.4101748,  6673.10904354,  2800.        ]],
     [[28315.56368133,  6812.44564901,  2800.        ],
     [28197.8230677,   6912.54705367,  2800.        ]],
     [[28049.54675563,  6956.10481526,  2800.        ]],
     [[27896.37306654,  6935.58740108,  2800.        ]],
     [[27764.78712281,  6854.54245845,  2800.        ],
     [27677.54132953,  6726.98339422,  2800.        ]]]

I tried to numpy.split() but I couldn't really find the right condition to use. Also imagine that at the end I would have to apply the same on corresponding matrices with quite a few million inputs.

Upvotes: 0

Views: 161

Answers (1)

Ehsan
Ehsan

Reputation: 12417

This is what you want:

import numpy_indexed as npi
eq = npi.group_by(orig[idxs])
print(eq.split(idxs))
print(eq.split(tri))
print(eq.split(pnts))

Obviously, you can sort them if you would like.

output:

#idxs
[array([0, 1]), array([20, 21]), array([8, 9]), array([14, 15]), array([2, 3]), array([17]), array([12, 13]), array([18]), array([4, 5]), array([7]), array([10, 11])]
#tri
[array([731, 703]), array([731, 730]), array([693, 673]), array([731, 731]), array([703, 731]), array([731]), array([731, 727]), array([731]), array([731, 731]), array([731]), array([699, 689])]
#pnts
[array([[28035.61081192,  6657.82528209,  2800.        ],
       [27951.42292993,  6561.84728091,  2800.        ]]), array([[27764.78712281,  6854.54245845,  2800.        ],
       [27677.54132953,  6726.98339422,  2800.        ]]), array([[27846.23402973,  6449.33687603,  2800.        ],
       [27974.71914494,  6368.71983786,  2800.        ]]), array([[28315.56368133,  6812.44564901,  2800.        ],
       [28197.8230677 ,  6912.54705367,  2800.        ]]), array([[28076.63625815,  6536.92743701,  2800.        ],
       [28139.0775588 ,  6773.36600593,  2800.        ]]), array([[28049.54675563,  6956.10481526,  2800.        ]]), array([[28232.24142249,  6653.66627254,  2800.        ],
       [28382.4101748 ,  6673.10904354,  2800.        ]]), array([[27896.37306654,  6935.58740108,  2800.        ]]), array([[27990.76839321,  6805.17674429,  2800.        ],
       [27856.70943257,  6734.2138896 ,  2800.        ]]), array([[27799.62835447,  6593.68175023,  2800.        ]]), array([[28124.96408673,  6389.55224384,  2800.        ],
       [28226.66757706,  6502.08637967,  2800.        ]])]

And if you want to convert them to lists (Note that numpy does not accept non-rectangular arrays like the ones above):

print(sorted([l.tolist() for l in eq.split(idxs)]))

output:

[[0, 1], [2, 3], [4, 5], [7], [8, 9], [10, 11], [12, 13], [14, 15], [17], [18], [20, 21]]

Upvotes: 1

Related Questions