Reputation: 353

How to split training and test sets?

Where should we use

X_train,X_test,y_train,y_test= train_test_split(data, test_size=0.3, random_state=42)

and where should we use

train, test= train_test_split(data, test_size=0.3, random_state=0).

The former one return this:

value error: not enough values to unpack (expected 4, got 2)

Upvotes: 8

Answers (3)

Leoli

Reputation: 749

if you have 1 data list, it split to 2,

                             |---data_train
data ----train_test_split()--|
                             |---data_test

if you have 2 data list, it split EACH of the data list to 2, that is 4 in total.

                                       |---data_train_x
                                       |---data_train_y
data_x, data_y ----train_test_split()--|
                                       |---data_test_x
                                       |---data_test_y

The same as n data list.

Upvotes: 1

Mihai Alexandru-Ionut

Reputation: 48427

train_test_split method accepts as many arrays as argument as you need.

But, since you need four returned values you have to pass 2 arrays as argument.

X_train, X_test, y_train, y_test= train_test_split(data, y_data, test_size=0.3, random_state=42)

If you need to pass many arrays you can use extended iterable unpacking operator.

train_test_split(*arrays, test_size = test_size, random_state = 0)

Upvotes: 1

MrLeeh

Reputation: 5589

The first form you use if you want to split instances with features (X) and labels (y). The second form you use if you only want to split features (X).

X_train, X_test, y_train, y_test= train_test_split(data, y, test_size=0.3, random_state=42)

The reason why it didn' t work for you was because you didn't prodide the label data in your train_test_split() function. The above should work well. Just replace y with your label/target data.

Upvotes: 1

How to split training and test sets?

Answers (3)

Related Questions