namalfernandolk
namalfernandolk

Reputation: 9134

What is the best (performance + memory) between String and HashSet for checking duplicates

I want to do a simple implementation to do some operations based on a distinct Codes (aCode) among the bigCodeList contains duplicates. Below I have mentioned two approaches what i want to know is what is the more effective one among them in performance vice + memory consumption wise?


Approach 1 :

    String tempStr = "";

    for(String aCode : bigCodeList){
        if(tempStr.indexOf(aCode) == -1) {
            // deal With the aCode related work
            tempStr += aCode+"-"
        }
    }

Approach 2 :

        HashSet<String> tempHSet = new HashSet<String>();

        for(String aCode : bigCodeList){

            if(tempHSet.add(aCode)){

                // deal With the aCode related work

            }

        }

Note : aCode is a Three Letter code like LON

Upvotes: 2

Views: 159

Answers (4)

Haseena
Haseena

Reputation: 53

Performance and memory wise Hashset is best than String to use in ur coding.

Appending values into string variable will take time

Upvotes: 0

Andremoniy
Andremoniy

Reputation: 34900

I think, that the first one approach is worse: indexOf operation has O(n), while for HashSet complexity could be O(1) for unique String keys look-up.

Furthermore, in the first approach you are using string concatenation operation, which implies new String object creation each time, which gives additional performance draw.

Upvotes: 1

Joachim Sauer
Joachim Sauer

Reputation: 308031

Approach 2 is by far better. You should not even consider approach 1.

First of all, approach 1 has linear time in searching. That means that when tempStr becomes twice as long, the time to search it becomes twice as long (on average, of course, if you always find the first element, it stays short).

Next: you're copying the entire tempStr each time your appending to it (because String objects are immutable and that's the only way to create a new one from an existing one). So the adding option takes ages as well.

Third (not a performance concern): Mixing data (aCode) and meta-data (the separator -) like this leads to all kinds of undesired effects. You might be sure that now aCode can never contain a dash, but what if that changes in two weeks?

Fourth: HashSet is built for pretty much exactly this use case! That's what it does best: hold a set of distinct objects, check if it already exists and add a new one.

Upvotes: 7

Holger
Holger

Reputation: 496

A java.util.Set will not allow duplicates, but it's fairly "quiet" about rejecting duplicates.

Upvotes: 0

Related Questions