Reputation: 9067
I've created this script to compute the string similarity in python. Is there any way I can make it run any faster?
tries = input()
while tries > 0:
mainstr = raw_input()
tot = 0
ml = len(mainstr)
for i in xrange(ml):
j = 0
substr = mainstr[i:]
ll = len(substr)
for j in xrange(ll):
if substr[j] != mainstr[j]:
break
j = j + 1
tot = tot + j
print tot
tries = tries - 1
EDIT: After applying some optimization this is the code, but it's not enough!
tries = int(raw_input())
while tries > 0:
mainstr = raw_input()
tot = 0
ml = len(mainstr)
for i in xrange(ml):
for j in xrange(ml-i):
if mainstr[i+j] != mainstr[j]:
break
j += 1
tot += j
print tot
tries = tries - 1
EDIT 2: The third version of the code. It's still no go!
def mf():
tries = int(raw_input())
for _ in xrange(tries):
mainstr = raw_input()
tot = 0
ml = len(mainstr)
for i in xrange(ml):
for j in xrange(ml-i):
if mainstr[i+j] != mainstr[j]:
break
j += 1
tot += j
print tot
mf()
Upvotes: 1
Views: 190
Reputation: 3190
For such simple numeric scripts there are just two things you have to do:
Use PyPy (it does not have complex dependencies and will be massively faster)
Put most of the code in a function. That speeds up stuff for both CPython and PyPy quite drastically. Instead of:
some_code
do:
def main():
some_code
if __name__ == '__main__':
main()
That's pretty much it.
Cheers, fijal
Upvotes: 1
Reputation: 414745
You could improve it by a constant factor if you use i = mainstr.find(mainstr[0], i+1)
instead of checking all i
. Special case for i==0 also could help.
Put the code inside a function. It also might speed up things by a constant factor.
Use for ... else: j += 1
to avoid incrementing j
at each step.
Try to find a better than O(n**2) algorithm that exploits the fact that you compare all suffixes of the string.
The most straight-forward C implementation is 100 times faster than CPython (Pypy is 10-30 times faster) and passes the challenge:
import os
def string_similarity(string, _cp=os.path.commonprefix):
return sum(len(_cp([string, string[i:]])) for i in xrange(len(string)))
for _ in xrange(int(raw_input())):
print string_similarity(raw_input())
The above optimizations give only several percents improvement and they are not enough to pass the challenge in CPython (Python time limit is only 8 time larger).
There is almost no difference (in CPython) between:
def string_similarity(string):
len_string = len(string)
total = len_string # similarity with itself
for i in xrange(1, len_string):
for n, c in enumerate(string[i:]):
if c != string[n]:
break
else:
n += 1
total += n
return total
And:
def string_similarity(string):
len_string = len(string)
total = len_string # similarity with itself
i = 0
while True:
i = string.find(string[0], i+1)
if i == -1:
break
n = 0
for n in xrange(1, len_string-i):
if string[i+n] != string[n]:
break
else:
n += 1
total += n
return total
Upvotes: 2
Reputation: 43054
Here's mine. It passes the test case, but may not be the absolute fastest.
import sys
def simstring(string, other):
val = 0
for l, r in zip(string, other):
if l != r:
return val
val += 1
return val
dsize = sys.stdin.readline()
for i in range(int(dsize)):
ss = 0
string = sys.stdin.readline().strip()
suffix = string
while suffix:
ss += simstring(string, suffix)
suffix = suffix[1:]
sys.stdout.write(str(ss)+"\n")
Upvotes: 0
Reputation: 363797
You can skip the memory allocation inside the loop. substr = mainstr[i:]
allocates a new string unnecessarily. You only use it in substr[j] != mainstr[j]
, which is equivalent to mainstr[i + j] != mainstr[j]
, so you don't need to build substr
.
Memory allocations are expensive, so you'll want to avoid them in tight loops.
Upvotes: 2