scutnex
scutnex

Reputation: 861

Randomly selecting positive and negative data from array

I have written the following function:

def searchPosotive (X,y, num):
    pos = sample(list(compress(X, y)), num)
    return (pos)

This function takes in two numpy matrix's, X and y. These two arrays are related ie. X[i] is the label for y[i]. The label is either a 1 or a 0.

This function randomly picks num values from X whose equivalent y value is equal to 1 and returns an (num, n) array where n is the number of columns in X.

I need to get a list of the index values for which it contains. For example, if pos[a] == X[a], a would need to be in that list. How can I do this?

I also need to do this for when I am looking for negative examples. The current function I use is:

def searchNegative (X,y, num):
    mat=X[y==0]
    rows = np.random.choice(len(mat), size=num,replace=False)
    mat=mat[rows,:]
    return (mat)

Upvotes: 2

Views: 204

Answers (1)

juanpa.arrivillaga
juanpa.arrivillaga

Reputation: 96172

You want to use np.where to get the indices of your positive (or negative) Y's. Then, sample from the indices. Here's a function for positive, you can either modify it to let you select positive or negative, or write another function just for negative: First, assume:

>>> y
array([1, 0, 1, 1, 1, 0, 0, 1, 0, 1])
>>> X
array([[-25,  62,  94,  70,  96,  70,  38, -18, -57,   1],
       [ 40,  86, -98, -48,  40,  29,   4, -83,  44, -12],
       [ 57,  23, -96,  97, -24, -93, -33, -64,  61,  15],
       [ 44,  29,  31, -38,  11,  85,  37, -96, -37, -70],
       [-10, -37, -24, -66,  27, -44, -16, -50,   3, -91],
       [-97,  81,  52,  41,  39, -14,  95,  76,  28, -32],
       [-74,  49, -91, -65, -96,  86, -13,  43,  22,  80],
       [  5,  20, -77,  74, -89,  46, -90,  95,  30,  13],
       [ 36,   6,  55, -74, -49, -66,  38,  37, -84,  28],
       [-23, -28, -32, -30,  -4, -52,  -4,  99, -67, -98]])

And so...

>>> def sample_positive(X, y, num):
...     pos_index = np.where(y == 1)[0]
...     rows = np.random.choice(pos_index, size=num, replace=False)
...     mat = X[rows,:]
...     return (mat, rows)
...
>>> X_sample, idx = sample_positive(X, y, 2)
>>> X_sample
array([[-23, -28, -32, -30,  -4, -52,  -4,  99, -67, -98],
       [-10, -37, -24, -66,  27, -44, -16, -50,   3, -91]])
>>> idx
array([9, 4])
>>> X
array([[-25,  62,  94,  70,  96,  70,  38, -18, -57,   1],
       [ 40,  86, -98, -48,  40,  29,   4, -83,  44, -12],
       [ 57,  23, -96,  97, -24, -93, -33, -64,  61,  15],
       [ 44,  29,  31, -38,  11,  85,  37, -96, -37, -70],
       [-10, -37, -24, -66,  27, -44, -16, -50,   3, -91],
       [-97,  81,  52,  41,  39, -14,  95,  76,  28, -32],
       [-74,  49, -91, -65, -96,  86, -13,  43,  22,  80],
       [  5,  20, -77,  74, -89,  46, -90,  95,  30,  13],
       [ 36,   6,  55, -74, -49, -66,  38,  37, -84,  28],
       [-23, -28, -32, -30,  -4, -52,  -4,  99, -67, -98]])
>>> y
array([1, 0, 1, 1, 1, 0, 0, 1, 0, 1])

Upvotes: 3

Related Questions