Encountering java.lang.IllegalArgumentException

Question

I am working on a Java project which is basically a Fake News Detection Application. The dataset contains two columns Text(News Article) and Label(0:Fake/1:Genuine). This data is converted to a JSON file. In Java, I have used Regex to replace all the stop words into Spaces(" "). Then I was working with vectorization in Java. I faced issues with inbuild vectorization techniques in Weka and Deeplearning4j. Now, I used "StringToWordVector" filter to vectorize the text. I will provide with the code for the .java files in my Java application.

DataProcessor.java

package fnd;

import com.fasterxml.jackson.databind.ObjectMapper;
import weka.core.Attribute;
import weka.core.DenseInstance;
import weka.core.Instance;
import weka.core.Instances;
import weka.core.converters.ArffSaver;

import java.io.File;
import java.io.IOException;
import java.util.ArrayList;

public class DataProcessor {

    public static void main(String[] args) {
        try {
            // Specify the path to your JSON file containing news data
            String jsonFilePath = "src/main/resources/fnd_output.json";

            // Create ObjectMapper instance to read JSON
            ObjectMapper objectMapper = new ObjectMapper();

            // Deserialize JSON array into an array of News objects
            News[] newsArray = objectMapper.readValue(new File(jsonFilePath), News[].class);

            // Prepare attributes for the Instances
            ArrayList attributes = new ArrayList<>();
            attributes.add(new Attribute("text", (ArrayList) null)); // Text attribute as string

            // Define nominal values for the label attribute
            ArrayList labelValues = new ArrayList<>();
            labelValues.add("positive");
            labelValues.add("negative");
            Attribute labelAttribute = new Attribute("label", labelValues); // Label attribute as nominal
            attributes.add(labelAttribute);

            // Create an empty Instances object
            Instances instances = new Instances("TextInstances", attributes, 0);

            // Set the index of the class attribute (label attribute)
            instances.setClassIndex(attributes.size() - 1);

            // Process each News object and add to Instances
            for (News news : newsArray) {
                String processedText = TextPreprocessor.preprocessText(news.getText());

                // Vectorize the processed text
                Instances vectorizedInstance = TextVectorization.vectorizeText(processedText);

                // Create a new Instance
                Instance instance = new DenseInstance(attributes.size());

                // Set the dataset for the instance
                instance.setDataset(instances);

                // Handle text attribute (assuming it's a string attribute)
                Attribute textAttr = attributes.get(0);
                if (textAttr.isString()) {
                    instance.setValue(textAttr, vectorizedInstance.instance(0).stringValue(0));
                } else {
                    System.err.println("Text attribute is not a string attribute.");
                }

                // Handle label attribute (assuming it's a nominal attribute)
                Attribute labelAttr = labelAttribute;
                if (labelAttr.isNominal()) {
                    instance.setValue(labelAttr, news.getLabel());
                } else {
                    System.err.println("Label attribute is not a nominal attribute.");
                }

                // Add the instance to Instances
                instances.add(instance);
            }

            // Output instances to ARFF file
            ArffSaver arffSaver = new ArffSaver();
            arffSaver.setInstances(instances);
            arffSaver.setFile(new File("vectorized_text_with_labels.arff"));
            arffSaver.writeBatch();

            System.out.println("Text vectorization complete with labels. Saved as vectorized_text_with_labels.arff");

        } catch (IOException e) {
            e.printStackTrace();
        } catch (Exception ex) {
            ex.printStackTrace();
        }
    }
}

TextVectorization

package fnd;

import java.io.File;import java.io.IOException;import java.util.ArrayList;

import weka.core.Attribute;import weka.core.Instances;import weka.core.DenseInstance;import weka.core.converters.ArffSaver;import weka.filters.Filter;import weka.filters.unsupervised.attribute.StringToWordVector;import weka.core.Instance;

public class TextVectorization {

 // Method to perform text vectorization (convert string to word vector)
public static Instances vectorizeText(String text) throws Exception {
    // Create ArrayList to hold attributes
    ArrayList attributes = new ArrayList<>();
    
    // Create a single attribute named "text"
    Attribute textAttribute = new Attribute("text", (ArrayList) null);
    attributes.add(textAttribute);
    
    // Create Instances object with the specified attribute
    Instances instances = new Instances("TextInstances", attributes, 0);
    instances.setClass(textAttribute); // Set the class attribute to "text"
    
    // Create a new Instance with the provided text
 // Create a new Instance
    Instance instance = new DenseInstance(instances.numAttributes());
    instance.setValue(textAttribute, text);
    instances.add(instance);
    
    // Apply StringToWordVector filter to vectorize the text
    StringToWordVector filter = new StringToWordVector();
    filter.setInputFormat(instances);
    Instances vectorizedData = Filter.useFilter(instances, filter);
    
    return vectorizedData;
}

public static void saveInstancesToArff(Instances instances, String filename) throws IOException {
    ArffSaver arffSaver = new ArffSaver();
    arffSaver.setInstances(instances);
    arffSaver.setFile(new File(filename));
    arffSaver.writeBatch();
}
}

TextPreprocessor

package fnd;

import java.util.regex.Matcher;import java.util.regex.Pattern;

public class TextPreprocessor {

private static final Pattern URL_PATTERN = Pattern.compile("http[s]?://\S+|www\.\S+");
private static final Pattern HTML_TAG_PATTERN = Pattern.compile("<[^>]+>");

public static String preprocessText(String text) {
    if (text == null || text.isEmpty()) {
        return "";
    }

    // Convert text to lowercase
    text = text.toLowerCase();

    // Remove URLs and HTML tags
    text = removeUrlsAndHtmlTags(text);

    // Remove non-word characters (except spaces), digits, and newline characters
    text = removeSpecialCharacters(text);

    return text;
}

private static String removeUrlsAndHtmlTags(String text) {
    Matcher urlMatcher = URL_PATTERN.matcher(text);
    text = urlMatcher.replaceAll("");

    Matcher htmlTagMatcher = HTML_TAG_PATTERN.matcher(text);
    text = htmlTagMatcher.replaceAll("");

    return text;
}

private static String removeSpecialCharacters(String text) {
    StringBuilder processedText = new StringBuilder(text.length());

    for (char ch : text.toCharArray()) {
        if (Character.isLetter(ch) || Character.isWhitespace(ch)) {
            processedText.append(ch);
        }
    }

    return processedText.toString();
}
}

Details about the error according to my knowledge.

java.lang.IllegalArgumentException: Attribute isn't nominal, string or date!at weka.core.AbstractInstance.stringValue(AbstractInstance.java:674)at weka.core.AbstractInstance.stringValue(AbstractInstance.java:644)at fnd.DataProcessor.main(DataProcessor.java:60)

Uncommenting this the line will run the code.

instance.setValue(textAttr, vectorizedInstance.instance(0).stringValue(0));

How can I vectorize the text and then fed the data into the model?

Encountering java.lang.IllegalArgumentException

Answers (1)

Related Questions