Jose Ramon
Jose Ramon

Reputation: 5386

Split a numpy array using masking in python

I have a numpy array my_array of size 100x20. I want to create a function that receives as an input a 2d numpy array my_arr and an index x and will return two arrays one with size 1x20 test_arr and one with 99x20 train_arr. The vector test_arr will correspond to the row of the matrix my_arr with the index x and the train_arr will contain the rest rows. I tried to follow a solution using masking:

def split_train_test(my_arr, x):

   a = np.ma.array(my_arr, mask=False)
   a.mask[x, :] = True
   a = np.array(a.compressed())
   return a

Apparently this is not working as i wanted. How can i return a numpy array as a result and the train and test arrays properly?

Upvotes: 1

Views: 1986

Answers (2)

akilat90
akilat90

Reputation: 5706

You can also use a boolean index as the mask:

def split_train_test(my_arr, x):

    # define mask
    mask=np.zeros(my_arr.shape[0], dtype=bool)
    mask[x] = True # True only at index x, False elsewhere

    return my_arr[mask, :], my_arr[~mask, :]

Sample run:

test_arr, train_arr = split_train_test(np.random.rand(100, 20), x=10)

print(test_arr.shape, train_arr.shape)
((1L, 20L), (99L, 20L))

EDIT:

If someone is looking for the general case where more than one element needs to be allocated to the test array (say 80%-20% split), x can also accept an array:

my_arr = np.random.rand(100, 20)
x = np.random.choice(np.arange(my_arr.shape[0]), int(my_arr .shape[0]*0.8), replace=False)

test_arr, train_arr = split_train_test(my_arr, x)
print(test_arr.shape, train_arr.shape)
((80L, 20L), (20L, 20L))

Upvotes: 0

akuiper
akuiper

Reputation: 215117

You can use simple index and numpy.delete for this:

def split_train_test(my_arr, x):
    return np.delete(my_arr, x, 0), my_arr[x:x+1]

my_arr = np.arange(10).reshape(5,2)

train, test = split_train_test(my_arr, 2)

train
#array([[0, 1],
#       [2, 3],
#       [6, 7],
#       [8, 9]])

test
#array([[4, 5]])

Upvotes: 2

Related Questions