Anna
Anna

Reputation: 1

Non overlapping data in train test validation split python

I'm trying to create a function for some deep learning issues for satellite images classification. I have searched through a lot of libraries and I haven't found my needs I tried this sikit-learn but I feel that it is not what I need

Any hint for a specialised function that I may not see?

Upvotes: 0

Views: 1017

Answers (3)

YakovK
YakovK

Reputation: 377

It seems to be a common problem: stratify_by is there but partition_by is not, meaning that the two sets should be non-overlapping on the value of a specific variable, such as video_id or patient_id.

Upvotes: 0

pietz
pietz

Reputation: 2553

This should do the trick. You can use the permutation array on the X and y data separately if you like.

num_tr, num_va = int(len(data)*0.5), int(len(data)*0.2)
perm = np.random.permutation(len(data))
tr_data = data[perm[:num_tr]]
va_data = data[perm[num_tr:num_tr+num_va]]
te_data = data[perm[num_tr+num_va:]]

Upvotes: 0

DKDK
DKDK

Reputation: 302

The sklearn train_test_split seems to fit all your needs.

https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html

Upvotes: 0

Related Questions