Mike
Mike

Reputation: 97

How to delete the duplicate data in arraylist java

public class TestArticles {

public static void handlewords() throws IOException {

    String path = "C:\\Features.txt";
    String path1 = "C:\\train.txt";
    String path2 = "C:\\test.txt";

    File file = new File(path2);
    PrintWriter pw = new PrintWriter(file);


    Features ft = new Features();
    String content = ft.readFile(path);
    String [] words = content.split(" ");

    FileReader fr = new FileReader(path1);  
    BufferedReader br = new BufferedReader(fr);
    String line = null;
    while ((line = br.readLine()) != null) {       
    String [] word = line.split(" ");

    List<String> list1 = new ArrayList<String>(words.length);
    List<String> list2 = new ArrayList<String>(word.length);

   for(String s: words){
       list1.add(s);
       HashSet set = new HashSet(list1);
       list1.clear();
       list1.addAll(set);
    }

     for(String x: word){
        list2.add(x);
        HashSet set = new HashSet(list2);
            list2.clear();
            list2.addAll(set);
    }

   boolean first = true;
   pw.append("{");
    for(String x: list1){
        for(String y: list2){
            if(x.equalsIgnoreCase(y)){
                if(first){
                   first = false; 
                } else {
                    pw.append(",");
                }
              pw.append(list1.indexOf(x) + 39 +" "+ "1");
            }
        }       
    }
       pw.append("}");
       pw.append("\r\n");
       pw.flush();   
    }
     br.close();
     pw.close();

}

My output file something like:

  1. {23 1,35 1,56 1,56 1,...}
  2. {2 1,4 1,7 1,...}

The first line some data duplicated, the second line all the data in order without duplicated data. How can I delete those duplicated data? I already used hashset, however it did not work.

Upvotes: 1

Views: 81

Answers (2)

janos
janos

Reputation: 124646

The items in your list1 and list2 are correctly unique, but in a case sensitive way. So you might have items in it like man and Man. But then in your last loop you use x.equalsIgnoreCase(y), and since "man".equalsIgnoreCase("man") and "man".equalsIgnoreCase("MAn") are both true, that's how duplicates appear.

There are several ways to fix that:

  • When you build list1 and list2, lowercase the items
  • Or, use a TreeSet instead of HashSet, with a comparator that ignores case
  • Change x.equalsIgnoreCase(y) to x.equals(y)

Upvotes: 2

DanW
DanW

Reputation: 247

Try override equals on your Hashsets, like this:

HashSet set = new HashSet(list1){
    public boolean equals(Object o) {
        return this.toString().equals(o.toString());
    };
};

Upvotes: 1

Related Questions