konstantin
konstantin

Reputation: 893

PIck randomly samples from a 2D matrix and keep the indexes in python

I have a numpy 2D matrix with data in python and I want to perform downsampling by keeping the 25% of the initial samples. In order to do so, I am using the following random.randint functionality:

reduced_train_face = face_train[np.random.randint(face_train.shape[0], size=300), :]

However, I am having a second matrix which contains the labels associated with the faces and I want to reduce with the same way. How, can I keep the indexes from the reduced matrix and apply them to the train_lbls matrix?

Upvotes: 2

Views: 684

Answers (2)

Nuageux
Nuageux

Reputation: 1686

You can fix the seed just before applying your extraction:

import numpy as np

# Each labels correspond to the first element of each line of face_train
labels_train =  np.array(range(0,15,3))
face_train = np.array(range(15)).reshape(5,3)
np.random.seed(0)
reduced_train_face = face_train[np.random.randint(face_train.shape[0], size=3), :]
np.random.seed(0)
reduced_train_labels = labels_train[np.random.randint(labels_train.shape[0], size=3)]

print(reduced_train_face, reduced_train_labels)
# [[12, 13, 14], [ 0,  1,  2], [ 9, 10, 11]], [12,  0,  9]

With the same seed, it will be reduce the same way.

edit: I advice you to use np.random.choice(n_total_elem, n_reduce_elem) in order to ensure that you only choose each data once and not twice the same data

Upvotes: 1

Gabe
Gabe

Reputation: 446

Why don't you keep the selected index and use them to select data from both matrices?

import numpy as np

# setting up matrices
np.random.seed(1234)  # make example repeatable 
                      # the seeding is optional, only for the showing the
                      # same results as below!
face_train = np.random.rand(8,3)
train_lbls= np.random.rand(8)

print('face_train:\n', face_train)
print('labels:\n', train_lbls)

# Setting the random indexes
random_idxs= np.random.randint(face_train.shape[0], size=4)
print('random_idxs:\n', random_idxs)

# Using the indexes to slice the matrixes
reduced_train_face = face_train[random_idxs, :]
reduced_labels = train_lbls[random_idxs]
print('reduced_train_face:\n', reduced_train_face)
print('reduced_labels:\n', reduced_labels)

Gives as output:

face_train:
 [[ 0.19151945  0.62210877  0.43772774]
 [ 0.78535858  0.77997581  0.27259261]
 [ 0.27646426  0.80187218  0.95813935]
 [ 0.87593263  0.35781727  0.50099513]
 [ 0.68346294  0.71270203  0.37025075]
 [ 0.56119619  0.50308317  0.01376845]
 [ 0.77282662  0.88264119  0.36488598]
 [ 0.61539618  0.07538124  0.36882401]]
labels:
 [ 0.9331401   0.65137814  0.39720258  0.78873014  0.31683612  0.56809865
  0.86912739  0.43617342]
random_idxs:
 [1 7 5 4]
reduced_train_face:
 [[ 0.78535858  0.77997581  0.27259261]
 [ 0.61539618  0.07538124  0.36882401]
 [ 0.56119619  0.50308317  0.01376845]
 [ 0.68346294  0.71270203  0.37025075]]
reduced_labels:
 [ 0.65137814  0.43617342  0.56809865  0.31683612]

Upvotes: 1

Related Questions