leila
leila

Reputation: 551

Does the train_test_split function keep the balance between classes

I have a question and I have looked for answers but I couldn't find an answer.

if i have a dataset labeled using three or more classes where each class represent 33% of the data. When I split my data does the training/validation/test sets keep the same balance between the classes?

If no is there a way to keep the balance?

Thanks in advance.

Upvotes: 5

Views: 5613

Answers (1)

leila
leila

Reputation: 551

found it!

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

Upvotes: 10

Related Questions