"Stratify" parameter from sklearn's train_test_split not working correctly?

Question

I have a problem with the stratify parameter in the train_test_split() function of scikit-learn. This is a dummy example with the same problem that appears randomly on my data:

from sklearn.model_selection import train_test_split
a = [1, 0, 0, 0, 0, 0, 0, 1]
train_test_split(a, stratify=a, random_state=42)

which returns:

[[1, 0, 0, 0, 0, 1], [0, 0]]

Shouldn't it select a "1" also in the test subset? From how I expect train_test_split() with stratify to work it should return something like:

[[1, 0, 0, 0, 0, 0], [0, 1]]

This happens with some values of random_state, while with other values it works correctly; but I cannot search for a "right" value of it every time I have to analyse data.

I have python 2.7 and scikit-learn 0.18.

"Stratify" parameter from sklearn's train_test_split not working correctly?

Answers (1)

Related Questions

&quot;Stratify&quot; parameter from sklearn&#39;s train_test_split not working correctly?

Answers (1)

Related Questions

"Stratify" parameter from sklearn's train_test_split not working correctly?