Given two numpy arrays, find the item in array A with unique values in array B

I am trying to implement K-means by selective centroid selection. I have two numpy arrays, one called "features" which has a set of numpy arrays where each array is a datapoint and another np array called "labels", which has the label of class the data point at an index "i" belongs to. I have datapoints related to 4 different classes. What I want to do is to make use of both these numpy arrays, and randomly pick a datapoint one from each class. Could you please help me out with this. Also, is there any way to zip two numpy arrays into a dictionary?

for example I have the features array as :

[[1,1,1],[1,2,3],[1,6,7],[1,4,6],[1,6,9],[1,4,2]] and my labels array is [1,2,2,3,1,3]

For each value unique in the labels numpy array, I want one randomly chosen corresponding element in the features array. A sample answer would be :

[1,1,1] from class 1
[1,6,7] from class 2
[1,4,2] from class 3

Upvotes: 0

Views: 232

Answers (3)

user3483203
user3483203

Reputation: 51155

You can accomplish this with a bit of indexing and numpy.unique


u = np.unique(labels)
f = np.arange(features.shape[0])

idx = np.random.choice(
    f, u.shape[0], replace=False
)

dict(zip(u, features[idx]))

{1: array([1, 4, 2]), 2: array([1, 6, 9]), 3: array([1, 1, 1])}

Upvotes: 0

sentence
sentence

Reputation: 8923

Try:

import numpy as np

features = np.array([[1,1,1],[1,2,3],[1,6,7],[1,4,6],[1,6,9],[1,4,2]])
labels = np.array([1,2,2,3,1,3])

res = {i: features[np.random.choice(np.where(labels == i)[0])] for i in set(labels)}

output

{1: array([1, 1, 1]), 2: array([1, 2, 3]), 3: array([1, 4, 2])}

Upvotes: 0

karhershey
karhershey

Reputation: 84

Given this is the setup in your question:

import numpy as np
features = [[1,1,1],[1,2,3],[1,6,7],[1,4,6],[1,6,9],[1,4,2]]
labels = np.array([1,2,2,3,1,3])

This should get you a random variable from each label in dictionary form:

features_index = np.array(range(0, len(features)))
unique_labels = np.unique(labels)
rand = []
for n in unique_labels:
    rand.append(features[np.random.choice(features_index[labels == n])])
dict(zip(unique_labels, rand))

Upvotes: 1

Related Questions