user2587726
user2587726

Reputation: 338

Classifying data based on a training set

I have some data that needs classifying. I've tried to use the classify function described here.

My sample is a matrix that has 1 column and 382 rows.

My training is a matrix with 1 column and 2 rows.

Grouping is causing me the issues. I've written: grouping = [a,b]; where a is one category and b is another.

This gives me the error:

Undefined function or variable 'a'.
Error in discrimtrialab (line 89) 
grouping = [a,b];

Further to this, how do I classify a group, ie. not just the exact value in training?

Here is my code:

a = -0.09306:0.0001:0.00476;
b = -0.02968:0.0001:0.01484;

%training = groups (odour index)

training = [-0.09306:0.00476; -0.02968:0.01484;];

%grouping variable

group = [a,b]

%classify

 [class, err]  = classify(sample, training, group, 'linear');

 class(a)

(note - there is some processing above this, but it is irrelevant to the question)

Upvotes: 0

Views: 2910

Answers (1)

nkjt
nkjt

Reputation: 7817

From the documentation:

class = classify(sample,training,group) classifies each row of the data in sample into one of the groups in training. (See Grouped Data.) sample and training must be matrices with the same number of columns. group is a grouping variable for training. Its unique values define groups; each element defines the group to which the corresponding row of training belongs.

That is, "group" must have the same number of rows as training. From the example in the help:

load fisheriris
SL = meas(51:end,1);
SW = meas(51:end,2);
group = species(51:end);

SL & SW are 100 x 1 matrices to be used for training (two different measurements made on each of 100 samples). group is a 100 x 1 cell array of strings indicating which species each of those measurements belongs to. It could also be a char array or simply a list of numbers (1,2,3) where each number refers to a different group, but it must have 100 rows.

e.g. if your training matrix was a 100 x 1 matrix of doubles, where the first 50 were values that belonged to 'a' and the second 50 were values that belonged to 'b' your group matrix could be:

group = [repmat('a',50,1);repmat('b',50,1)];

However, if all your "groups" are just non-overlapping ranges as stated here in the comments:

What I want classify to do is work out whether or not each number in "sample" is type A, ie, in the range -0.04416 +/- 0.0163, or type B, with the range -0.00914 +/- 0.00742

then you don't really need classify. To extract the values from sample which are equal to a value plus or minus some tolerance:

sample1 = sample(abs(sample-value)<tol);

ETA after latest comment: "group" can be a numeric vector, so if you have a training data set which you need to group based on the ranges of some variable, then something like (this code is unchecked but the basic principle should be sound):

%presume "data" is our training data (381 x 3) and "sample" (n x 2) is the data we want to classify
group = zeros(length(data),1); %empty matrix

% first column is variable for grouping, second + third are data equivalent to the entries in "sample".
training = data(:,2:3);

% find where data(:,1) meets whatever our requirements are and label groups with numbers
group(data(:,1)<3)=1;  % group "1" is wherever first column is below 3
group(data(:,1)>7)=2;  % group "2" is wherever first column is above 7
group(group==0)=NaN; % set any remaining data to NaN

%now we classify "sample" based on "data" which has been split into "training" and "group" variables
class = classify(sample, training, group);

Upvotes: 1

Related Questions