MSG
MSG

Reputation: 353

How to split training and test sets?

Where should we use

X_train,X_test,y_train,y_test= train_test_split(data, test_size=0.3, random_state=42)

and where should we use

train, test= train_test_split(data, test_size=0.3, random_state=0). 

The former one return this:

value error: not enough values to unpack (expected 4, got 2)

Upvotes: 8

Views: 4588

Answers (3)

Leoli
Leoli

Reputation: 749

if you have 1 data list, it split to 2,

                             |---data_train
data ----train_test_split()--|
                             |---data_test

if you have 2 data list, it split EACH of the data list to 2, that is 4 in total.

                                       |---data_train_x
                                       |---data_train_y
data_x, data_y ----train_test_split()--|
                                       |---data_test_x
                                       |---data_test_y

The same as n data list.

Upvotes: 1

Mihai Alexandru-Ionut
Mihai Alexandru-Ionut

Reputation: 48327

train_test_split method accepts as many arrays as argument as you need.

But, since you need four returned values you have to pass 2 arrays as argument.

X_train, X_test, y_train, y_test= train_test_split(data, y_data, test_size=0.3, random_state=42)

If you need to pass many arrays you can use extended iterable unpacking operator.

train_test_split(*arrays, test_size = test_size, random_state = 0)

Upvotes: 1

MrLeeh
MrLeeh

Reputation: 5589

The first form you use if you want to split instances with features (X) and labels (y). The second form you use if you only want to split features (X).

X_train, X_test, y_train, y_test= train_test_split(data, y, test_size=0.3, random_state=42)

The reason why it didn' t work for you was because you didn't prodide the label data in your train_test_split() function. The above should work well. Just replace y with your label/target data.

Upvotes: 1

Related Questions