asn
asn

Reputation: 2587

Shuffle and split 2 numpy arrays so as to maintain their ordering with respect to each other

I have 2 numpy arrays X and Y, with shape X: [4750, 224, 224, 3] and Y: [4750,1].

X is the training dataset and Y is the correct output label for each entry.

I want to split the data into train and test so as to validate my machine learning model. Therefore, I want to split them randomly so that they both have the correct ordering after random split is applied on X and Y. ie- every row of X is correctly has its corresponding label unchanged after the split.

How can I achieve the above objective ?

Upvotes: 1

Views: 675

Answers (3)

asn
asn

Reputation: 2587

You can also use the scikit-learn train_test_split to split your data using just 2 lines of code :

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.33)

Upvotes: 2

Blownhither Ma
Blownhither Ma

Reputation: 1471

sklearn.model_selection.train_test_split is a good choice!

But to craft one of your own

import numpy as np

def my_train_test_split(X, Y, train_ratio=0.8):
    """return X_train, Y_train, X_test, Y_test"""
    n = X.shape[0]
    split = int(n * train_ratio)
    index = np.arange(n)
    np.random.shuffle(index)
    return X[index[:split]], Y[index[:split]], X[index[split:]], Y[index[split:]]

Upvotes: 1

Atul Shanbhag
Atul Shanbhag

Reputation: 636

This is how I would do it

def split(x, y, train_ratio=0.7):
  x_size = x.shape[0]
  train_size = int(x_size * train_ratio)
  test_size = x_size - train_size
  train_indices = np.random.choice(x_size, size=train_size, replace=False)
  mask = np.zeros(x_size, dtype=bool)
  mask[train_indices] = True
  x_train, y_train = x[mask], y[mask]
  x_test, y_test = x[~mask], y[~mask]
  return (x_train, y_train), (x_test, y_test)

I simply choose the required number of indices I need (randomly) for my train set, remaining will be for the test set.

Then use a mask to select the train and test samples.

Upvotes: 2

Related Questions