zkanoca
zkanoca

Reputation: 9928

How to implement data I have to svmtrain() function in MATLAB?

I have to write a script using MATLAB which will classify my data.

My data consists of 1051 web pages (rows) and 11000+ words (columns). I am holding the word occurences in the matrix for each page. The first 230 rows are about computer science course (to be labeled with +1) and remaining 821 are not (to be labeled with -1). I am going to label few part of these rows (say 30 rows) by myself. Then SVM will label the remaining unlabeled rows.

I have found that I could solve my problem using MATLAB's svmtrain() and svmclassify() methods. First I need to create SVMStruct.

SVMStruct = svmtrain(Training,Group)

Then I need to use

Group = svmclassify(SVMStruct,Sample)

But the point that I do not know what Training and Group are. For Group Mathworks says:

Grouping variable, which can be a categorical, numeric, or logical vector, a cell vector of strings, or a character matrix with each row representing a class label. Each element of Group specifies the group of the corresponding row of Training. Group should divide Training into two groups. Group has the same number of elements as there are rows in Training. svmtrain treats each NaN, empty string, or 'undefined' in Group as a missing value, and ignores the corresponding row of Training.

And for Training it is said that:

Matrix of training data, where each row corresponds to an observation or replicate, and each column corresponds to a feature or variable. svmtrain treats NaNs or empty strings in Training as missing values and ignores the corresponding rows of Group.

I want to know how I can adopt my data to Training and Group? I need (at least) a little code sample.

EDIT

What I did not understand is that in order to have SVMStruct I have to run

SVMStruct = svmtrain(Training, Group);

and in order to have Group I have to run

Group = svmclassify(SVMStruct,Sample);

Also I still did not get what Sample should be like?

I am confused.

Upvotes: 4

Views: 7336

Answers (1)

eigenchris
eigenchris

Reputation: 5821

Training would be a matrix with 1051 rows (the webpages/training instances) and 11000 columns (the features/words). I'm assuming you want to test for the existence of each word on a webpage? In this case you could make the entry of the matrix a 1 if the word exists for a given webpage and a 0 if not.

You could initialize the matrix with Training = zeros(1051,11000); but filling the entries would be up to you, presumably done with some other code you've written.

Group is a 1-D column vector with one entry for every training instance (webpage) than tells you which of two classes the webpage belongs to. In your case you would make the first 230 entries a "+1" for computer science and the remaining 821 entries a "-1" for not.

Group = zeros(1051,1);  % gives you a matrix of zeros with 1051 rows and 1 column
Group(1:230) = 1;       % set first 230 entries to +1
Group(231:end) = -1;    % set the rest to -1

Upvotes: 2

Related Questions