Dheya Majid
Dheya Majid

Reputation: 413

How do I convert text files to .arff format(weka)

Please advise me How do I convert text files to .arff format(weka) because i wan to do data clustering for 1000 txt file.

regards

Upvotes: 2

Views: 7610

Answers (2)

Asad
Asad

Reputation: 3042

Here is the code you can use

package text.Classification;
import java.io.*;
import weka.core.*;
public class TextDirectoryToArff {

public Instances createDataset(String directoryPath) throws Exception {
FastVector atts;
FastVector attVals;
atts = new FastVector();
atts.addElement(new Attribute("contents", (FastVector) null));
String[] s = { "class1", "class2", "class3" };
attVals = new FastVector();
for (String p : s)
    attVals.addElement(p);
atts.addElement(new Attribute("class", attVals));
Instances data = new Instances("MyRelation", atts, 0);
System.out.println(data);

InputStreamReader is = null;
File dir = new File(directoryPath);
String[] files = dir.list();
for (int i = 0; i < files.length; i++) {
    if (files[i].endsWith(".txt")) {
    double[] newInst = new double[2];
    File txt = new File(directoryPath + File.separator + files[i]);

    is = new InputStreamReader(new FileInputStream(txt));
    StringBuffer txtStr = new StringBuffer();

    int c;
    while ((c = is.read()) != -1) {
        txtStr.append((char) c);
    }
    newInst[0] = data.attribute(0).addStringValue(txtStr.toString());
    int j=i%(s.length-1);
    newInst[1] = attVals.indexOf(s[j]);
    data.add(new Instance(1.0, newInst));

    }
}
return data;
}



public static void main(String[] args) {

TextDirectoryToArff tdta = new TextDirectoryToArff();
try {
    Instances dataset = tdta.createDataset("/home/asadul/Desktop/Downloads/text_example/class5");
    PrintWriter fileWriter = new PrintWriter("/home/asadul/Desktop/Downloads/text_example/abc.arff", "UTF-8");
    fileWriter.println(dataset);
    fileWriter.close();
} catch (Exception e) {
    System.err.println(e.getMessage());
    e.printStackTrace();
}

}

}

Upvotes: 0

arutaku
arutaku

Reputation: 6087

There are some converters implemented in WEKA, just find the right format or make little changes to your data (using awk, sed...).

Here is the API pages related to this topic: http://weka.sourceforge.net/doc.stable/weka/core/converters/package-summary.html

For exapmle here is how to convert from CSV to ARFF:

java weka.core.converters.CSVLoader filename.csv > filename.arff

Upvotes: 1

Related Questions