Reputation: 107
I have two arrays, e.g. one is labels another is distances:
labels= array([3, 1, 0, 1, 3, 2, 3, 2, 1, 1, 3, 1, 2, 1, 3, 2, 2, 3, 3, 3, 2, 3,
0, 3, 3, 2, 3, 2, 3, 2,...])
distances = array([2.32284095, 0.36254613, 0.95734965, 0.35429638, 2.79098656,
5.45921793, 2.63795657, 1.34516461, 1.34028463, 1.10808795,
1.60549826, 1.42531201, 1.16280383, 1.22517273, 4.48511033,
0.71543217, 0.98840598,...])
What I want to do is to group the values from distances into N arrays based on the amount of unique label values (in this case N=4). So all values with label = 3 go in one array with label = 2 in another and so on.
I can think of simple brute force with loops and if-conditions but this will incur serious slowdown on large arrays. I feel that there are better ways of doing this by using either native list comprehension or numpy, or something else, just not sure what. What would be best, most efficient approaches?
"Brute force" example for reference, note:(len(labels)==len(distances))
:
all_distance_arrays = []
for id in np.unique(labels):
sorted_distances = []
for index in range(len(labels)):
if id == labels[index]:
sorted_distances.append(distances[index])
all_distance_arrays.append(sorted_distances)
Upvotes: 1
Views: 1127
Reputation: 114230
You can do this with numpy functions only. First sort the arrays in lockstep (which is what np.unique
does under the hood anyway), then split them where the label changes:
i = np.argsort(labels)
labels = labels[i]
distances = distances[i]
splitpoints = np.flatnonzero(np.diff(labels)) + 1
result = np.split(distances, splitpoints)
unique_labels = labels[np.r_[0, split_points]]
Upvotes: 0
Reputation: 1480
"Brute force" seems likely to be adequate with a reasonable number of labels:
from collections import defaultdict
dist_group = defaultdict(list)
for lb, ds in zip(labels, distances):
dist_group[lb].append(ds)
It's hard to tell why this would not fit your purposes.
Upvotes: 0
Reputation: 2816
By using just NumPy as:
_, counts = np.unique(labels, return_counts=True) # counts is the repeatation number of each index
sor = labels.argsort()
sections = np.cumsum(counts) # end index of slices
labels_sor = np.split(labels[sor], sections)[:-1]
distances_sor = np.split(distances[sor], sections)[:-1]
Upvotes: 1
Reputation:
A simple list comprehension will be nice and fast:
groups = [distances[labels == i] for i in np.unique(labels)]
Output:
>>> groups
[array([0.95734965]),
array([0.36254613, 0.35429638, 1.34028463, 1.10808795, 1.42531201,
1.22517273]),
array([5.45921793, 1.34516461, 1.16280383, 0.71543217, 0.98840598]),
array([2.32284095, 2.79098656, 2.63795657, 1.60549826, 4.48511033])]
Upvotes: 2