How to use train_test_split in sklearn to create non-sequential but ordered splits?

Question

I need to split my 2D image data and corresponding target values (pandas DataFrame). For example the targets looks so:

| radius   | distance | omega    |
| -------- | -------- | -------- |
| Cell 1   | Cell 2   | Cell 3   |
| Cell 4   | Cell 5   | Cell 6   | 
| Cell 7   | Cell 8   | Cell 9   | 
| Cell 10  | Cell 11  | Cell 12  |

and the images are just 2D numpy arrays.

I want the test set to contain non-sequential but ordered indices and represent about 20% of the data. For example, if I have 10 data points, then maybe my test indices would be [0,2] but not [2,0].

I've written a function that does this, but it's quite long and I'd be grateful if anyone knows if it can be done with the sklearn function train_test_split. I also tried using the option shuffle=False, but it results in a sequential test split as far as I noticed. Other function suggestions are welcomed.

How to use train_test_split in sklearn to create non-sequential but ordered splits?

Answers (0)

Related Questions