Performance question about string slicing in Python

Question

I am learning some python and in the process of it I'm doing some simple katas from codewars. I run into https://www.codewars.com/kata/scramblies problem.

My solution went as follows:

def scramble(s1,s2):
    result = True
    for character in s2:
        if character not in s1:
            return False
        i = s1.index(character)
        s1 = s1[0:i] + s1[i+1:]
    return result

While it was correct result, it wasn't fast enough. My solution timed out after 12000 ms. I looked at the solutions presented by others and one involved making a set.

  def scramble(s1,s2):
     for letter in set(s2):
        if s1.count(letter) < s2.count(letter):
          return False
     return True

Why is my solution so much slower than the other one? It doesn't look like it should be unless I am misunderstanding something efficiency of slicing strings. Is my approach to solving this problem flawed or not pythonic?

kaya3 · Accepted Answer

For this kind of online programming challenge with a limit on your program's running time, the test inputs will include some quite large examples, and the time limit is usually set so that you don't have to squeeze every last millisecond of performance out of your code, but you do have to write an algorithm of a low enough computational complexity. To answer why your algorithm times out, we can analyse it to find its computational complexity using big O notation.

First we can label each individual statement with its complexity, where n is the length of s1 and m is the length of s2:

def scramble(s1,s2):
    result = True                  # O(1)
    for character in s2:           # loop runs O(m) times
        if character not in s1:        # O(n) to search characters in s1
            return False               # O(1)
        i = s1.index(character)        # O(n) to search characters in s1
        s1 = s1[0:i] + s1[i+1:]        # O(n) to build a new string
    return result                  # O(1)

Then the total complexity is O(1 + m*(n + 1 + n + n) + 1) or more simply, O(m*n). This is not efficient for this problem.

The key to why the alternate algorithm is faster lies in the fact that set(s2) contains only the distinct characters from the string s2. This is important because the alphabet that these strings are formed from has a constant, limited size; presumably 26 for the lowercase letters. Given this, the outer loop of the alternate algorithm actually runs at most 26 times:

def scramble(s1,s2):
    for letter in set(s2):                      # O(m) to build a set 
                                                # loop runs O(1) times
        if s1.count(letter) < s2.count(letter):     # O(n) + O(m) to count
                                                    # chars from s1 and s2
            return False                            # O(1)
    return True                                 # O(1)

This means the alternate algorithm's complexity is O(m + 1*(n + m + 1) + 1) or more simply O(m + n), meaning it is asymptotically more efficient than your algorithm.

Performance question about string slicing in Python

Answers (2)

Related Questions