Revinder
Revinder

Reputation: 27

Performance question about string slicing in Python

I am learning some python and in the process of it I'm doing some simple katas from codewars. I run into https://www.codewars.com/kata/scramblies problem.

My solution went as follows:

def scramble(s1,s2):
    result = True
    for character in s2:
        if character not in s1:
            return False
        i = s1.index(character)
        s1 = s1[0:i] + s1[i+1:]
    return result

While it was correct result, it wasn't fast enough. My solution timed out after 12000 ms. I looked at the solutions presented by others and one involved making a set.

  def scramble(s1,s2):
     for letter in set(s2):
        if s1.count(letter) < s2.count(letter):
          return False
     return True

Why is my solution so much slower than the other one? It doesn't look like it should be unless I am misunderstanding something efficiency of slicing strings. Is my approach to solving this problem flawed or not pythonic?

Upvotes: 2

Views: 1071

Answers (2)

kaya3
kaya3

Reputation: 51037

For this kind of online programming challenge with a limit on your program's running time, the test inputs will include some quite large examples, and the time limit is usually set so that you don't have to squeeze every last millisecond of performance out of your code, but you do have to write an algorithm of a low enough computational complexity. To answer why your algorithm times out, we can analyse it to find its computational complexity using big O notation.

First we can label each individual statement with its complexity, where n is the length of s1 and m is the length of s2:

def scramble(s1,s2):
    result = True                  # O(1)
    for character in s2:           # loop runs O(m) times
        if character not in s1:        # O(n) to search characters in s1
            return False               # O(1)
        i = s1.index(character)        # O(n) to search characters in s1
        s1 = s1[0:i] + s1[i+1:]        # O(n) to build a new string
    return result                  # O(1)

Then the total complexity is O(1 + m*(n + 1 + n + n) + 1) or more simply, O(m*n). This is not efficient for this problem.


The key to why the alternate algorithm is faster lies in the fact that set(s2) contains only the distinct characters from the string s2. This is important because the alphabet that these strings are formed from has a constant, limited size; presumably 26 for the lowercase letters. Given this, the outer loop of the alternate algorithm actually runs at most 26 times:

def scramble(s1,s2):
    for letter in set(s2):                      # O(m) to build a set 
                                                # loop runs O(1) times
        if s1.count(letter) < s2.count(letter):     # O(n) + O(m) to count
                                                    # chars from s1 and s2
            return False                            # O(1)
    return True                                 # O(1)

This means the alternate algorithm's complexity is O(m + 1*(n + m + 1) + 1) or more simply O(m + n), meaning it is asymptotically more efficient than your algorithm.

Upvotes: 1

Green Cloak Guy
Green Cloak Guy

Reputation: 24681

First of all, set is fast and very good at its job. For things like in, set is faster than list.

Second of all, your solution is doing way more work than the correct solution. Note how the second solution never modifies s1 or s2, whereas your solution both takes two slices of s1 and then reassigns s1. This, along with calling .index(). Slicing isn't the fastest operation, mainly because memory has to be allocated and data has to be copied. .remove() would probably be faster than the combination of .index() and slicing that you're doing.

The underlying message here is if the task can be done in fewer operations, it's obviously going to execute more quickly. Slicing is also more expensive than most other methods because allocating space and copying memory is a more expensive operation than the computational methods like .count() that the correct solution uses.

Upvotes: 0

Related Questions