Reputation: 935
I have a file that contains a 400 images. What I want is to separate this file into two files: train_images and test_images.
The train_images should contains 150 images selected randomly, and all these images must be different from each other. Then, the test_images should also contains 150 images selected randomly, and should be different from each other, even from the images selected in the file train_images.
I begin by writing a code that aims to select a random number of images from a Faces file and put them on train_images file. I need your help in order to respond to my behavior described above.
clear all;
close all;
clc;
Train_images='train_faces';
mkdir(Train_images);
ImageFiles = dir('Faces');
totalNumberOfImages = length(ImageFiles)-1;
scrambledList = randperm(totalNumberOfImages);
numberIWantToUse = 150;
loop_counter = 1;
for index = scrambledList(1:numberIWantToUse)
baseFileName = ImageFiles(index).name;
str = fullfile('faces', baseFileName); % Better than STRCAT
face = imread(str);
imwrite( face, fullfile(Train_images, ['hello' num2str(index) '.jpg']));
loop_counter = loop_counter + 1;
end
Any help will be very appreciated.
Upvotes: 0
Views: 952
Reputation: 12689
Your code looks good to me. When you implement the test, you can re-run the scrambledList = randperm(totalNumberOfImages);
then select the first 150 elements in scrambledList
as you did in training process.
You can also directly re-initialize the loop:
for index = scrambledList(numberIWantToUse+1 : 2*numberIWantToUse)
... % same thing you wrote in your training loop
end
with this approach, your test sample will be completely different from the training sample.
Upvotes: 1
Reputation: 2449
Supposing that you have the Bioinformatics Toolbox, you can use crossvalind
using the parameter HoldOut
:
This is an example. train
and test
are logical arrays, so you can use find
to get the actual indexes:
ImageFiles = dir('Faces');
ImageFilesIndexes = ones(1,length(ImageFiles )) %Use a numeric array instead the char array
proportion = 150/400; %Testing set
[train,test] = crossvalind('holdout',ImageFilesIndexes,proportion );
training_files = ImageFiles(train); %250 files: It is better to use more data to train
testing_files = ImageFiles(test); %150 files
%Then do whatever you like with the files
Other possibilities are dividerand
( Neural Network Toolbox) and cvpartition
(Statistics Toolbox)
Upvotes: 1