Reputation: 523
I have to find the first duplicate element in a list. Suppose I have a list [1, 2, 3, 4, 3, 1, 2, 9]
. The first duplicate here is 3 because when we traverse it from left to right, the 2nd 3 comes first before 2nd 1 and 2.
For solving this problem I have two solutions.
Using set
def firstDuplicateValue(array):
duplicate = set()
for value in array:
if value in duplicate:
return value
else:
duplicate.add(value)
return -1
Using dictionary
def firstDuplicateValue(array):
duplicate = {}
for value in array:
if value in duplicate:
return value
else:
duplicate[value] = True
return -1
Which one is more time efficient in python and why? Is there any other best solution?
Upvotes: 0
Views: 269
Reputation: 1267
In a discussion which can be found in the comments, someone correctly pointed out that the time- and memory-efficiency of the above approaches change with the size of the array to be looked up.
Both Set and Dictionary use a colelction of unique keys, so having similar logic, looking up these keys should have O(n) complexity and you they should not take significantly different timex. The code below, tests these using a list of integers 0, 1, 2, ... and another 0 at the end. Also, note that in the comments below, some have suggested other (perhaps better) approaches to time the processes.
import time
import sys
def firstDuplicateValueSet(array):
duplicate = set()
for value in array:
if value in duplicate:
print(sys.getsizeof(duplicate)/1024)
return value
else:
duplicate.add(value)
return -1
def firstDuplicateValueDict(array):
duplicate = {}
for value in array:
if value in duplicate:
print(sys.getsizeof(duplicate)/1024)
return value
else:
duplicate[value] = True
return -1
size = 50000000
arr = [i for i in range(size)]
arr.append(0)
tic = time.time()
firstDuplicateValueSet(arr)
toc = time.time()
print(toc-tic)
tic = time.time()
firstDuplicateValueDict(arr)
toc = time.time()
print(toc-tic)
tic = time.time()
Memory and time for firstDuplicateValueSet():
>>> 2097152.21875
>>> 8.740191221237183
Memory and time for firstDuplicateValueDict():
>>> 2621440.1015625
>>> 9.246715545654297
Setting size
to 81000000
results in the opposite results. Less time for Dictionary and more memory for Set!
>>> 4194304.21875 #set memory
>>> 15.37313985824585 #set time
>>> 2621440.1015625 #dictionary memory
>>> 11.566540002822876 #dictionary time
Upvotes: 1