Reputation: 835
I have two inputs X1, X2 and corresponding label Y. I want to split the data into training and validation using SkLearn's train_test_split. My X1 is of shape (1920,12) and X2 is of shape(1920,51,5). The code I use is :
from sklearn.model_selection import train_test_split
X1 = np.load('x_train.npy')
X2 = np.load('oneHot.npy')
y_train = np.load('y_train.npy')
X = np.array(list(zip(X1, X2))) ### To zip the two inputs.
X_train, X_valid, y_train, y_valid = train_test_split(X, y_train,test_size=0.2)
X1_train, oneHot_train = X_train[:, 0], X_train[:, 1]
However when I check the shape X1_train and oneHot_train it is (1536,) whereas X1_train should be (1536,12) and oneHot_train should be (1536,51,5). What am I doing wrong here? Insights will be appreciated.
Upvotes: 1
Views: 1572
Reputation: 16966
train_test_split
can take up any number of iterators for splitting. Hence, you can directly feed the x1
and x2
- like below:
x1 = np.random.rand(1920,12)
x2 = np.random.rand(1920,51,5)
y = np.random.choice([0,1], 1920)
x1_train, x1_test, x2_train, x2_test, y_train, y_test = train_test_split(\
x1, x2, y ,test_size=0.2)
x1_train.shape, x1_test.shape
# ((1536, 12), (384, 12))
x2_train.shape, x2_test.shape
# ((1536, 51, 5), (384, 51, 5))
y_train.shape, y_test.shape
# ((1536,), (384,))
Upvotes: 2