Reputation: 7967
I was recently trying to solve this problem on Hackerrank. I ended up with the following solution which gives the right answer within the given time limit:
from collections import Counter
lisa=[]
for each in range(input()):
current=raw_input()
count=0
lookup_dict={}
i=0
for i in range(len(current)):
for j in range(i,len(current)):
sub_string=current[i:j+1]
sub_string=''.join(sorted(sub_string))
if sub_string in lookup_dict.keys():
lookup_dict[sub_string]+=1
else:
lookup_dict[sub_string]=1
for k in lookup_dict.values():
count+=k*(k-1)/2
print count
My problem is that the solution provided by them (reproduced below) is significantly faster than the one I wrote even though the complexity and method used is same.
from collections import *
for i in range(input()):
s = raw_input()
check = defaultdict(int)
l = len(s)
for i in range(l):
for j in range(i+1,l+1):
sub = list(s[i:j])
sub.sort()
sub = "".join(sub)
check[sub]+=1
sum = 0
for i in check:
n = check[i]
sum += (n*(n-1))/2
print sum
My guess is that it has something to do with the defaultdict
method, but can't figure out why?
Upvotes: 3
Views: 120
Reputation: 8572
Talking about defaultdict
: its optimized a bit compared to plain key check. I.e.:
x = defaultdict(int)
for i in xrange(...):
x[i] += 1
performs ~50% faster than
x = {}
for i in xrange(...):
if i in x:
x[i] +=1
else:
x[1] = 1
if case of all keys missing.
But the main case is that calling dict.keys()
in py2 actually creates a list. So checking key in dict.keys()
need allocate memory for list first, then populate it with actual key values, and after that check your key against it. What is worst, right after that this list should be cleaned by garbage collector, and in next for
step you'll do the same, except more memory will be allocated for that list. So, that's actually kill performance in your example.
Upvotes: 1
Reputation: 90899
In Python 2.x , dict.keys()
returns a list, and in your first solution, you are actually doing -
if sub_string in lookup_dict.keys()
This would be an O(n) operation (n being the size of the dictonary - lookup_dict
) , since .keys()
actually returns a list and containment check in list are O(n) and also most probably there is the added cost of having to create the list.
Whereas in the second approach, you do not have any such check, rather defaultdict is handling setting the default value automatically for you, and that may be one of the reasons why your first solution is significantly slower than the second (Given that you do the dict.keys()
lookup in the innermost loop , so that lookup happens that many times).
Example showing dict.keys()
returning list -
>>> d = {1:2,3:4,5:6,7:8}
>>> d.keys()
[1, 3, 5, 7]
Upvotes: 4