Reputation: 23214
I have a program that trains an algorithm with a 2-class categorical outcome, then runs and writes out predictions (probabilities of each of the 2 classes) for an unlabeled data set.
All data sets run against this program will have the same 2 classes as the outcome. With this in mind I ran the predictions and used a little post-hoc statistics to figure out which column of results described which outcome, and proceeded to hard code them:
public class runPredictions {
public static void runPredictions(ArrayList al2) throws IOException, Exception{
// Retrieve objects
Instances newTest = (Instances) al2.get(0);
Classifier clf = (Classifier) al2.get(1);
// Print status
System.out.println("Generating predictions...");
// create copy
Instances labeled = new Instances(newTest);
BufferedWriter outFile = new BufferedWriter(new FileWriter("silverbullet_rro_output.csv"));
StringBuilder builder = new StringBuilder();
builder.append("Prob_Retain"+","+"Prob_Attrite"+"\n");
for (int i = 0; i < labeled.size(); i++)
{
double[] clsLabel = clf.distributionForInstance(newTest.instance(i));
for(int j=0;j<2;j++){
builder.append(clsLabel[j]+"");
if(j < clsLabel.length - 1)
builder.append(",");
}
builder.append("\n");
}
outFile.write(builder.toString());//save the string representation
System.out.println("Output file written.");
System.out.println("Completed successfully!");
outFile.close();
}
}
The problem with this is that it turns out that which of the 2 columns describes which of the 2 outcome categories is not fixed. It seems to have to do with which category appears first in the training data set, which is entirely arbitrary. So when other data sets were used with this program the hard-coded labels were backwards.
So, I need a better way to label them, but looking at the documentation for Classifier
and distributionForInstance
and I'm not seeing anything useful.
Update:
I figured out how to print it to the screen (thanks to this), but still had trouble with writing it to csv:
for (int i = 0; i < labeled.size(); i++)
{
// Discreet prediction
double predictionIndex =
clf.classifyInstance(newTest.instance(i));
// Get the predicted class label from the predictionIndex.
String predictedClassLabel =
newTest.classAttribute().value((int) predictionIndex);
// Get the prediction probability distribution.
double[] predictionDistribution =
clf.distributionForInstance(newTest.instance(i));
// Print out the true predicted label, and the distribution
System.out.printf("%5d: predicted=%-10s, distribution=",
i, predictedClassLabel);
// Loop over all the prediction labels in the distribution.
for (int predictionDistributionIndex = 0;
predictionDistributionIndex < predictionDistribution.length;
predictionDistributionIndex++)
{
// Get this distribution index's class label.
String predictionDistributionIndexAsClassLabel =
newTest.classAttribute().value(
predictionDistributionIndex);
// Get the probability.
double predictionProbability =
predictionDistribution[predictionDistributionIndex];
System.out.printf("[%10s : %6.3f]",
predictionDistributionIndexAsClassLabel,
predictionProbability );
// Attempt to write to CSV
builder.append(i+","+predictedClassLabel+","+
predictionDistributionIndexAsClassLabel+","+predictionProbability);
//.charAt(0)+','+predictionProbability.charAt(0));
}
System.out.printf("\n");
builder.append("\n");
Upvotes: 1
Views: 698
Reputation: 2811
I adapted the code below from this answer and this answer. Basically, you can query the test data for the class attribute, then obtain the specific value for each possible class.
for (int i = 0; i < labeled.size(); i++)
{
// Discreet prediction
double predictionIndex =
clf.classifyInstance(newTest.instance(i));
// Get the predicted class label from the predictionIndex.
String predictedClassLabel =
newTest.classAttribute().value((int) predictionIndex);
// Get the prediction probability distribution.
double[] predictionDistribution =
clf.distributionForInstance(newTest.instance(i));
// Print out the true predicted label, and the distribution
System.out.printf("%5d: predicted=%-10s, distribution=",
i, predictedClassLabel);
// Loop over all the prediction labels in the distribution.
for (int predictionDistributionIndex = 0;
predictionDistributionIndex < predictionDistribution.length;
predictionDistributionIndex++)
{
// Get this distribution index's class label.
String predictionDistributionIndexAsClassLabel =
newTest.classAttribute().value(
predictionDistributionIndex);
// Get the probability.
double predictionProbability =
predictionDistribution[predictionDistributionIndex];
System.out.printf("[%10s : %6.3f]",
predictionDistributionIndexAsClassLabel,
predictionProbability );
// Write to CSV
builder.append(i+","+
predictionDistributionIndexAsClassLabel+","+predictionProbability);
}
System.out.printf("\n");
builder.append("\n");
}
// Save results in .csv file
outFile.write(builder.toString());//save the string representation
Upvotes: 1