Reputation: 9067

How to optimize this python script further?

I've created this script to compute the string similarity in python. Is there any way I can make it run any faster?

tries = input()
while tries > 0:
    mainstr = raw_input()
    tot = 0
    ml = len(mainstr)
    for i in xrange(ml):
        j = 0
        substr = mainstr[i:]
        ll = len(substr)
        for j in xrange(ll):
            if substr[j] != mainstr[j]:
                break
            j = j + 1
        tot = tot + j
    print tot
    tries = tries - 1

EDIT: After applying some optimization this is the code, but it's not enough!

tries = int(raw_input())
while tries > 0:
    mainstr = raw_input()
    tot = 0
    ml = len(mainstr)
    for i in xrange(ml):
        for j in xrange(ml-i):
            if mainstr[i+j] != mainstr[j]:
                break
            j += 1
        tot += j
    print tot
    tries = tries - 1

EDIT 2: The third version of the code. It's still no go!

def mf():
    tries = int(raw_input())
    for _ in xrange(tries):
        mainstr = raw_input()
        tot = 0
        ml = len(mainstr)
        for i in xrange(ml):
            for j in xrange(ml-i):
                if mainstr[i+j] != mainstr[j]:
                    break
                j += 1
            tot += j
        print tot
mf()

Upvotes: 1

Answers (4)

fijal

Reputation: 3190

For such simple numeric scripts there are just two things you have to do:

Use PyPy (it does not have complex dependencies and will be massively faster)
Put most of the code in a function. That speeds up stuff for both CPython and PyPy quite drastically. Instead of:

some_code

do:

def main():
    some_code

if __name__ == '__main__':
    main()

That's pretty much it.

Cheers, fijal

Upvotes: 1

jfs

Reputation: 414745

You could improve it by a constant factor if you use i = mainstr.find(mainstr[0], i+1) instead of checking all i. Special case for i==0 also could help.

Put the code inside a function. It also might speed up things by a constant factor.

Use for ... else: j += 1 to avoid incrementing j at each step.

Try to find a better than O(n**2) algorithm that exploits the fact that you compare all suffixes of the string.

The most straight-forward C implementation is 100 times faster than CPython (Pypy is 10-30 times faster) and passes the challenge:

import os

def string_similarity(string, _cp=os.path.commonprefix):
    return sum(len(_cp([string, string[i:]])) for i in xrange(len(string)))

for _ in xrange(int(raw_input())):
    print string_similarity(raw_input())

The above optimizations give only several percents improvement and they are not enough to pass the challenge in CPython (Python time limit is only 8 time larger).

There is almost no difference (in CPython) between:

def string_similarity(string):
    len_string = len(string)
    total = len_string # similarity with itself
    for i in xrange(1, len_string):
        for n, c in enumerate(string[i:]):
            if c != string[n]:
                break
        else:
            n += 1

        total += n
    return total

And:

def string_similarity(string):
    len_string = len(string)
    total = len_string # similarity with itself
    i = 0
    while True:
        i = string.find(string[0], i+1)
        if i == -1:
            break
        n = 0
        for n in xrange(1, len_string-i):
            if string[i+n] != string[n]:
                break
        else:
            n += 1

        total += n
    return total

Upvotes: 2

Keith

Reputation: 43054

Here's mine. It passes the test case, but may not be the absolute fastest.

import sys

def simstring(string, other):
    val = 0
    for l, r in zip(string, other):
        if l != r:
            return val
        val += 1
    return val


dsize = sys.stdin.readline()

for i in range(int(dsize)):
    ss = 0
    string = sys.stdin.readline().strip()
    suffix = string
    while suffix:
        ss += simstring(string, suffix)
        suffix = suffix[1:]
    sys.stdout.write(str(ss)+"\n")

Upvotes: 0

Fred Foo

Reputation: 363797

You can skip the memory allocation inside the loop. substr = mainstr[i:] allocates a new string unnecessarily. You only use it in substr[j] != mainstr[j], which is equivalent to mainstr[i + j] != mainstr[j], so you don't need to build substr.

Memory allocations are expensive, so you'll want to avoid them in tight loops.

Upvotes: 2

How to optimize this python script further?

Answers (4)

Related Questions