John
John

Reputation: 835

Sklearn's train_test_split not working on multiple inputs

I have two inputs X1, X2 and corresponding label Y. I want to split the data into training and validation using SkLearn's train_test_split. My X1 is of shape (1920,12) and X2 is of shape(1920,51,5). The code I use is :

from sklearn.model_selection import train_test_split

X1 = np.load('x_train.npy')
X2 = np.load('oneHot.npy')
y_train = np.load('y_train.npy')

X = np.array(list(zip(X1, X2))) ### To zip the two inputs.

X_train, X_valid, y_train, y_valid = train_test_split(X, y_train,test_size=0.2)

X1_train, oneHot_train = X_train[:, 0], X_train[:, 1]

However when I check the shape X1_train and oneHot_train it is (1536,) whereas X1_train should be (1536,12) and oneHot_train should be (1536,51,5). What am I doing wrong here? Insights will be appreciated.

Upvotes: 1

Views: 1572

Answers (1)

Venkatachalam
Venkatachalam

Reputation: 16966

train_test_split can take up any number of iterators for splitting. Hence, you can directly feed the x1 and x2 - like below:

x1 = np.random.rand(1920,12)

x2 = np.random.rand(1920,51,5)

y = np.random.choice([0,1], 1920)

x1_train, x1_test, x2_train, x2_test, y_train, y_test = train_test_split(\
     x1, x2, y ,test_size=0.2)

x1_train.shape, x1_test.shape
 # ((1536, 12), (384, 12))

x2_train.shape, x2_test.shape
 # ((1536, 51, 5), (384, 51, 5))

y_train.shape, y_test.shape
 # ((1536,), (384,))

Upvotes: 2

Related Questions