sj22
sj22

Reputation: 101

How can I use WEKA Machine Learning software to classify the following type of data?

I have a .csv file which consists of 10 columns. The first 9 are related to the properties of a particular item, while the 10th column has the "Class" which states which item it is.

I am trying to run the following classifiers -

I am having some trouble trying to proceed. I am supposed to divide my data such that - First half is to be trained and test the results using the second half of the data.

I begin with going to the "Explorer" and opening the .csv file. I select all the attributes, including "CLASS' and then go to the classify tab.

From there, I select the "Percentage Split" as 50% and simply "Start" the different classifiers (as mentioned before).

So these are the questions -

Can anyone help me with this?

Thanks!

Upvotes: 1

Views: 341

Answers (2)

DAV
DAV

Reputation: 756

  • Yes the method is right (for Weka anyway)
  • Yes, you need to include the CLASS. Particularly for algorithms requiring supervised training. It is used to train the algorithm. Without it how would the trainer know what the answer should be?
  • You can try adjusting the parameters but you should do this to get a better response to the TRAINING data.Of course, there is always the possibility of overfit. If you allow the testing to influence the training then you have just used the test data as an auxiliary training set -- it's no longer test data.

Someone asked a similar question here How to build a good training data set for machine learning and predictions? They look like different questions but involve the same considerations.

Upvotes: 1

Bella
Bella

Reputation: 34

Your question is a little bit too general, but I will try to help:

  1. Make sure that the "Class" column is selected in the "Classify" tab (below "More Options" button)

  2. You can use 2-fold cross validation which correspond to 50%/50% split

  3. Increase training set size - use 80%/20% percentage split or even 90%/10% instead of 50%/50% (corresponds to 5-fold and 10-fold cross validation respectively). This may help if you have a small sample size

  4. Choose your classifiers wisely - depending on your problem, you can also use for example Decision Trees (such as J48) and Random Forest.

Upvotes: 1

Related Questions