Reputation: 1
I need to split my 2D image data and corresponding target values (pandas DataFrame). For example the targets looks so:
| radius | distance | omega |
| -------- | -------- | -------- |
| Cell 1 | Cell 2 | Cell 3 |
| Cell 4 | Cell 5 | Cell 6 |
| Cell 7 | Cell 8 | Cell 9 |
| Cell 10 | Cell 11 | Cell 12 |
and the images are just 2D numpy arrays.
I want the test set to contain non-sequential but ordered indices and represent about 20% of the data. For example, if I have 10 data points, then maybe my test indices would be [0,2] but not [2,0].
I've written a function that does this, but it's quite long and I'd be grateful if anyone knows if it can be done with the sklearn function train_test_split. I also tried using the option shuffle=False, but it results in a sequential test split as far as I noticed. Other function suggestions are welcomed.
Upvotes: 0
Views: 21