Reputation: 3583
How to load a CSV file without headers in Weka?
There are a few related questions, but none seems to get to the point.
MWE
Here is the test.csv
file:
20,1,"+"
30,2,"+"
30,1,"+"
15,1,"-"
10,0,"-"
Here is the Test.java
code:
// javac -Xlint -cp weka.jar Test.java && java -cp .:weka.jar Test
import weka.core.converters.CSVLoader;
import weka.core.Instances;
import weka.classifiers.Classifier;
import weka.classifiers.bayes.NaiveBayes;
import weka.classifiers.Evaluation;
import java.io.File;
class Test
{
public static void main(String[] args) {
try {
CSVLoader loader = new CSVLoader();
loader.setOptions(new String[] {"-H"});
loader.setSource(new File("test.csv"));
Instances tr = loader.getDataSet();
tr.setClassIndex(tr.numAttributes() - 1);
Classifier m = (Classifier) new NaiveBayes();
m.buildClassifier(tr);
Evaluation eval = new Evaluation(tr);
eval.evaluateModel(m, tr);
System.out.println(eval.toSummaryString());
}
catch(Exception ex) {
System.out.println(ex);
}
}
}
When running, it only reports 4 instances, not 5. If I add headers, then it works correctly.
Correctly Classified Instances 4 100 %
Incorrectly Classified Instances 0 0 %
Kappa statistic 1
Mean absolute error 0.0065
Root mean squared error 0.0112
Relative absolute error 1.3088 %
Root relative squared error 2.2477 %
Total Number of Instances 4
Notice I have used:
loader.setOptions(new String[] {"-H"});
I have also tried the direct API loader.setNoHeaderRowPresent(true);
, but it seems to not be available in Weka 3.6.13.
References:
EDIT: It turns out this was a problem in 3.6.13. The code works fine for 3.7.10.
Upvotes: 3
Views: 2294
Reputation: 3583
As a work-around, this reads the CSV file and passes it along as an ARFF file:
// javac -Xlint -cp weka.jar Test.java && java -cp .:weka.jar Test
import weka.core.converters.CSVLoader;
import weka.core.Instances;
import weka.classifiers.Classifier;
import weka.classifiers.bayes.NaiveBayes;
import weka.classifiers.Evaluation;
import java.io.FileReader;
import java.io.BufferedReader;
import java.io.StringReader;
import java.lang.StringBuffer;
class Test
{
public static void main(String[] args) {
try {
String filename = "test.csv";
BufferedReader br = new BufferedReader(new FileReader(filename));
String line = br.readLine();
int cols = line.length() - line.replace(",", "").length() + 1;
StringBuilder arff = new StringBuilder("@RELATION test\n");
for(int i = 0; i < cols-1; i++) {
arff.append("@ATTRIBUTE ");
arff.append(String.valueOf((char)(i + 'a')));
arff.append(" NUMERIC\n");
}
arff.append("@ATTRIBUTE class {+,-}\n");
arff.append("@DATA\n");
while(line != null) {
arff.append(line);
arff.append("\n");
line = br.readLine();
}
System.out.println(arff.toString());
Instances tr = new Instances(new StringReader(arff.toString()));
tr.setClassIndex(tr.numAttributes() - 1);
Classifier m = (Classifier) new NaiveBayes();
m.buildClassifier(tr);
Evaluation eval = new Evaluation(tr);
eval.evaluateModel(m, tr);
System.out.println(eval.toSummaryString());
}
catch(Exception ex) {
System.out.println(ex);
}
}
}
Upvotes: 0
Reputation: 4329
I am not sure about 3.6.13, but the code for 3.7.10 shows that first row of data is added if setNoHeaderRowPresent is set true.
You are setting false, set it to true.Refrence from grepcode of CSVLoader
Set whether there is no header row in the data.
Parameters: b true if there is no header row in the data
public void setNoHeaderRowPresent(boolean b) {
m_noHeaderRow = b; 293
}
if (m_noHeaderRow) {
m_rowBuffer.add(firstRow);
}
So in your code use
loader.setNoHeaderRowPresent(true)
and not loader.setNoHeaderRowPresent(false) to include first row in data set.
Upvotes: 3