Reputation: 1673

Efficient way to find duplicate items among lists in Groovy

I have groovy map built in following manner :

list1 = [ "val1" "val2" "val3" ]
list2 = [ "val7" "val8" ]
list3 = [ "val4" "val5" "val2" ]
list4 = [ "val6" "val4" "val3" ]
map["key1"] = list1
map["key2"] = list2
map["key3"] = list3
map["key4"] = list4

I need to iterate through map and compare each list with another lists in the map and if any of the item from a list matches with item with any of other list then throw an error for e.g. in above case,as list1's value ( val2 ) matches with list3, so it should throw an error and stop processing further in the map.

I can intersect 2 lists to find the duplicates but for this I may need to iterate over the map twice, for e.g. for each key , fetch the values and then iterate again the map to fetch lists for other keys and intersect one by one from first list to find the duplicates OR something similar along this direction but that would not be an efficient way to achieve this.

Can this be achieved in more efficient way in groovy?

Upvotes: 2

Answers (4)

IWilms

Reputation: 590

With slight modifications @ycr's solution can be improved. The problem is that each execution of disjoint iterates over the list merged while merged grows from step to step. So in case of m lists with n elements each this would lead to O(m²n log(n)) time complexity or at least O(m²n) if you (reasonably) neglect the log term for using TreeSet in disjoint.

Instead, we can use a HashSet for merged, iterate over each element in each list and check if it exists in merged. If we assume that finding an element in a HashSet takes O(1), we need O(mn) for that. Note that mn is the number of elements of all lists combined, so the time complexity of this method is linear in the number of all elements.

Set merged = []
map.each { key, val ->
    println "KEY: $key VAL: $val"
    if(val.find{ it in merged }) {
        throw new Exception("Duplicates!!!")
    }
    merged.addAll(val)
}

Upvotes: 0

ycr

Reputation: 14574

How about something like the below using disjoint

def list1 = [ "val1", "val2" ,"val3" ]
def list2 = [ "val7", "val8" ]
def list3 = [ "val4", "val5", "val24" ]
def list4 = [ "val6", "val4", "val3" ]

def map = [:]

map["key1"] = list1
map["key2"] = list2
map["key3"] = list3
map["key4"] = list4

def merged = []

map.each { key, val ->
    println "KEY: $key VAL: $val"
    if(!merged.disjoint(val)) {
         throw new Exception("Duplicates!!!")
    }
    merged.addAll(val) 
}

Here the time complexity of disjoint is O(n). Also, it's worth noting that between two lists a and b if a.size() < b.size() the time complexity will be O(a).

Upvotes: 2

Dale

Reputation: 1628

I would put them into a List. You'd only have to loop through the list of Strings once. As you are putting the items into the List first do a check to see if the item is in the list.

import java.util.ArrayList;
import java.util.List;

public class Main {

    List<String> mergedList = new ArrayList<>();
    public String stringList1[] = { "12q3", "21351", "1312", "12331", "34554", "cfA23" };
    public String stringList2[] = { "12ASD3", "21351", "13\32", "12F331", "345D4", "cfF23" };

    public static void main(String args[]) {
        Main main = new Main();
        main.start();
    }

public void start() {
    List<String[]> listOfLists = new ArrayList<>();
    listOfLists.add(stringList1);
    listOfLists.add(stringList2);

    for (String[] strings : listOfLists) {
        checkForDuplicates(strings);
    }

}

public void checkForDuplicates(String[] listOfStrings) {
    for (String currentString : listOfStrings) {
        if (mergedList.contains(currentString)) {
            System.out.println("Duplicated Item:currentString");
            continue;
        }
        System.out.println("Added:" + currentString);
        mergedList.add(currentString);
    }

}
}

Upvotes: 0

Bruno Grieder

Reputation: 29814

Iterating each element of a list over all elements of another list is O(n^2)

Instead, sort the lists. This O(n.log(n)).

You can then iterate over all the lists at the same time in O(n)

So asymptotically, the algorithm is going to be O(n.log(n))

If n is small, there may be a smarter way to do this, in particular if one of the list is really small. In that case the “brute force” algorithm in O(n^2) may fail faster on average, but, in any case, if n is small, optimizing is not worth: program what is the easiest to understand by the maintainer.

Upvotes: 0

Efficient way to find duplicate items among lists in Groovy

Answers (4)

Related Questions