Emma Bnz
Emma Bnz

Reputation: 1

How to split images dataset into test/training/validation sets in python?

I am trying to program a simple program for fingerprint recognition which will train on a dataset of 80 images, I have used the following code to load the data :

data = glob.glob('/content/drive/MyDrive/DB2_B/*')

How to split my image dataset in training set and test set !

Upvotes: 0

Views: 1415

Answers (1)

Sai Ganesh
Sai Ganesh

Reputation: 21

I normally use the package split-folders. You can give it a try. The below code splits 80% of data into the training set and 20% of data into the test set. That is what x in the split_data function represents. You do not have to create the output directory as it will create the folders for you. After installing the package using:

pip install split-folders 

Try this:

import os
import splitfolders

def split_data(input_dir, output_dir, x):
    splitfolders.ratio(input_dir, output = output_dir, seed = 1337, ratio = (x,1-x), group_prefix = None)

split_data('./input','./output',0.8)

I noticed your data is in a separate folder, it would be easier if it is just in the parent folder of your python file. If that is not possible, you can just edit the input directory in the above code. This will create an output folder where your current directory with the python file is. You can try this after the function is defined:

split_data('/content/drive/MyDrive/DB2_B' , './output' , 0.8)

Upvotes: 2

Related Questions