Shakhawat95
Shakhawat95

Reputation: 523

From set and dictionary, which is more efficient to find the first duplicate in a list?

I have to find the first duplicate element in a list. Suppose I have a list [1, 2, 3, 4, 3, 1, 2, 9]. The first duplicate here is 3 because when we traverse it from left to right, the 2nd 3 comes first before 2nd 1 and 2.

For solving this problem I have two solutions.

Using set

def firstDuplicateValue(array):
    duplicate = set()
    for value in array:
        if value in duplicate:
            return value
        else:
            duplicate.add(value)
    return -1

Using dictionary

def firstDuplicateValue(array):
    duplicate = {}
    for value in array:
        if value in duplicate:
            return value
        else:
            duplicate[value] = True
    return -1

Which one is more time efficient in python and why? Is there any other best solution?

Upvotes: 0

Views: 269

Answers (1)

Homayoun Hamedmoghadam
Homayoun Hamedmoghadam

Reputation: 1267

In a discussion which can be found in the comments, someone correctly pointed out that the time- and memory-efficiency of the above approaches change with the size of the array to be looked up.

Both Set and Dictionary use a colelction of unique keys, so having similar logic, looking up these keys should have O(n) complexity and you they should not take significantly different timex. The code below, tests these using a list of integers 0, 1, 2, ... and another 0 at the end. Also, note that in the comments below, some have suggested other (perhaps better) approaches to time the processes.

import time
import sys

def firstDuplicateValueSet(array):
    duplicate = set()
    for value in array:
        if value in duplicate:
            print(sys.getsizeof(duplicate)/1024)
            return value
        else:
            duplicate.add(value)
    return -1

def firstDuplicateValueDict(array):
    duplicate = {}
    for value in array:
        if value in duplicate:
            print(sys.getsizeof(duplicate)/1024)
            return value
        else:
            duplicate[value] = True
    return -1

size = 50000000
arr = [i for i in range(size)]
arr.append(0)

tic = time.time()
firstDuplicateValueSet(arr)
toc = time.time()
print(toc-tic)
tic = time.time()
firstDuplicateValueDict(arr)
toc = time.time()
print(toc-tic)
tic = time.time()

Memory and time for firstDuplicateValueSet():

>>> 2097152.21875
>>> 8.740191221237183

Memory and time for firstDuplicateValueDict():

>>> 2621440.1015625
>>> 9.246715545654297

Setting size to 81000000 results in the opposite results. Less time for Dictionary and more memory for Set!

>>> 4194304.21875 #set memory
>>> 15.37313985824585 #set time
>>> 2621440.1015625 #dictionary memory
>>> 11.566540002822876 #dictionary time

Upvotes: 1

Related Questions