Reputation: 353
Where should we use
X_train,X_test,y_train,y_test= train_test_split(data, test_size=0.3, random_state=42)
and where should we use
train, test= train_test_split(data, test_size=0.3, random_state=0).
The former one return this:
value error: not enough values to unpack (expected 4, got 2)
Upvotes: 8
Views: 4588
Reputation: 749
if you have 1 data list, it split to 2,
|---data_train
data ----train_test_split()--|
|---data_test
if you have 2 data list, it split EACH of the data list to 2, that is 4 in total.
|---data_train_x
|---data_train_y
data_x, data_y ----train_test_split()--|
|---data_test_x
|---data_test_y
The same as n data list.
Upvotes: 1
Reputation: 48327
train_test_split
method accepts as many arrays as argument as you need.
But, since you need four returned values you have to pass 2
arrays as argument.
X_train, X_test, y_train, y_test= train_test_split(data, y_data, test_size=0.3, random_state=42)
If you need to pass many arrays you can use extended iterable unpacking operator.
train_test_split(*arrays, test_size = test_size, random_state = 0)
Upvotes: 1
Reputation: 5589
The first form you use if you want to split instances with features (X) and labels (y). The second form you use if you only want to split features (X).
X_train, X_test, y_train, y_test= train_test_split(data, y, test_size=0.3, random_state=42)
The reason why it didn' t work for you was because you didn't prodide the label data in your train_test_split()
function. The above should work well. Just replace y
with your label/target data.
Upvotes: 1