Grouping data by sklearn.model_selection.GroupShuffleSplit

Question

I have a dataset in a CSV with header as

PRODUCT_ID  CATEGORY_NAME   PRODUCT_TYPE DISPLAY_COLOR_NAME IMAGE_ID

with same product having multiple rows each with different image_id. I made Image Id as index col when reading CSV into pandas data frame.

I want to create test and train dataset by grouping the data at product_type or any other column. Also make sure same data is not repeated in both test and train dataset (since I have multiple lines for product with different images)

How can I achieve this using sklearn.model_selection.GroupShuffleSplit

Grouping data by sklearn.model_selection.GroupShuffleSplit

Answers (1)

Related Questions