Reputation: 551
I have a question and I have looked for answers but I couldn't find an answer.
if i have a dataset labeled using three or more classes where each class represent 33% of the data. When I split my data does the training/validation/test sets keep the same balance between the classes?
If no is there a way to keep the balance?
Thanks in advance.
Upvotes: 5
Views: 5613
Reputation: 551
found it!
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)
Upvotes: 10