G Gr
G Gr

Reputation: 6090

Sampling data in MATLAB

I have two pieces of data. One is the actual fulldata which is a dataset of 49625x6 numeric data, and the other is the index of that data with the target_class named Book2 which is 49625x1.

Book2 has six names (strings) repeated over and over again to match the fulldata dataset entries. I want to take 1,000 samples from fulldata of which 25% of the 1000 samples are "blue" and 75% are "red" using Book2, then contain this in a new subsample named sampledata.

How can I achieve this in MATLAB?

Pseudo Code:

Choose 250 blue samples from Book2, not sure how to "choose" 250 random "blue" samples bluesample = indX(Book2, :) or Book2(indX, :) not sure.

Choose 750 Red samples from Book2, again not sure how to "choose" 750 random "red" samples redsample = indX(Book2, ;) or Book2(indX, :) again not sure here.

Combine blue and red samples into subsample.

subsample = join(bluesample, redsample)

Find the indices of subsample and create sampledata from fulldata:

sampledata = subsample(indX(fulldata), :) This line is probably wrong

This is an image of the two datasets:

Enter image description here

Each row in Book2 matches the row in fulldata. I am trying to achieve the ability to choose a certain amount of "normal" and a certain amount of "not normal" (yes, I know they are not aptly named) data from fulldata using Book2, as Book2 is the indices of fulldata and contains the class labels.

So in terms of my dataset it might be said easier this way:

Choose 250 random samples of the string "normal." from Book2 and log the row number.
Choose 750 random samples of the string "not normal." from Book2 and log the row number.
Combine the two random samples of row numbers together.
Make a new dataset (1000x6) using the combined row numbers (above) of fulldata.

Upvotes: 4

Views: 3820

Answers (1)

Dan
Dan

Reputation: 45752

Extract the 'normal' records using strmatch:

normIdx = strmatch('normal.', Book2);
normalSubset = fulldata(normIdx, :);

Then to generate a list of 250 random non repeating integers I googled "matlab list of non repeated random integers" and from the first result:

p = randperm(size(normalSubset , 1));
p = p(1:250)-1;

So now to get your 250 randomly selected normal records

normalSample = normalSubset (p, :);

normalSample will be 250 x 6. now do the same with 'not normal.' to get a notNormalSample (750 x 6) and then combine then to get

sample = [normalSample ; notNormalSample ]

So in sample all the normals will appear before the not normals, if you want to mix them up use randperm() again:

sample = sample(randperm(1000), :);

Upvotes: 1

Related Questions