Nick Div
Nick Div

Reputation: 5628

What is the fastest method to find duplicates from a collection

This is what I have tried and somehow I get the feeling that this is not right or this is not the best performing application, so is there a better way to do the searching and fetching the duplicate values from a Map or as a matter of fact any collection. And a better way to traverse through a collection.

public class SearchDuplicates{
    public static void main(String[] args) {
        Map<Integer, String> directory=new HashMap<Integer, String>();
        Map<Integer, String> repeatedEntries=new HashMap<Integer, String>();

        // adding data
        directory.put(1,"john");
        directory.put(2,"michael");
        directory.put(3,"mike");
        directory.put(4,"anna");
        directory.put(5,"julie");
        directory.put(6,"simon");
        directory.put(7,"tim");
        directory.put(8,"ashley");
        directory.put(9,"john");
        directory.put(10,"michael");
        directory.put(11,"mike");
        directory.put(12,"anna");
        directory.put(13,"julie");
        directory.put(14,"simon");
        directory.put(15,"tim");
        directory.put(16,"ashley");

        for(int i=1;i<=directory.size();i++) {
           String result=directory.get(i);
           for(int j=1;j<=directory.size();j++) {
              if(j!=i && result==directory.get(j) &&j<i) {
                 repeatedEntries.put(j, result);
              }
           }
           System.out.println(result);
        }
        for(Entry<Integer, String> entry : repeatedEntries.entrySet()) {
           System.out.println("repeated "+entry.getValue());   
        }
   }
}

Any help would be appreciated. Thanks in advance

Upvotes: 3

Views: 4132

Answers (4)

Ted Hopp
Ted Hopp

Reputation: 234795

You can use a Set to determine whether entries are duplicate. Also, repeatedEntries might as well be a Set, since the keys are meaningless:

Map<Integer, String> directory=new HashMap<Integer, String>();
Set<String> repeatedEntries=new HashSet<String>();
Set<String> seen = new HashSet<String>();

// ... initialize directory, then:

for(int j=1;j<=directory.size();j++){
    String val = directory.get(j);
    if (!seen.add(val)) {
        // if add failed, then val was already seen
        repeatedEntries.add(val);
    }
}

At the cost of extra memory, this does the job in linear time (instead of quadratic time of your current algorithm).

EDIT: Here's a version of the loop that doesn't rely on the keys being consecutive integers starting at 1:

for (String val : directory.values()) {
    if (!seen.add(val)) {
        // if add failed, then val was already seen
        repeatedEntries.add(val);
    }
}

That will detect duplicate values for any Map, regardless of the keys.

Upvotes: 6

DarkHorse
DarkHorse

Reputation: 2770

You can use Collection.frequency to find all possible duplicates in any collection using

Collections.frequency(list, "a")

Here is a proper example

Most generic method to find

Set<String> uniqueSet = new HashSet<String>(list);
    for (String temp : uniqueSet) {
        System.out.println(temp + ": " + Collections.frequency(list, temp));
    }

References from above link itself

Upvotes: 1

Sabbir
Sabbir

Reputation: 126

List, Vector have a method contains(Object o) which return Boolean value based either this object is exist in collection or not.

Upvotes: 1

Evgeniy Dorofeev
Evgeniy Dorofeev

Reputation: 135992

You can use this to found word count

    Map<String, Integer> repeatedEntries = new HashMap<String, Integer>();
    for (String w : directory.values()) {
        Integer n = repeatedEntries.get(w);
        n = (n == null) ? 1 : ++n;
        repeatedEntries.put(w, n);
    }

and this to print the stats

    for (Entry<String, Integer> e : repeatedEntries.entrySet()) {
        System.out.println(e);
    }

Upvotes: 1

Related Questions