What is more efficient? Using .replace() or passing string to list

Question

Solving the following problem from CodeFights:

Given two strings, find the number of common characters between them. For s1 = "aabcc" and s2 = "adcaa", the output should be commonCharacterCount(s1, s2) = 3.

Strings have 3 common characters - 2 "a"s and 1 "c".

The way I approached it, whenever I took a letter into account I wanted to cancel it out so as not to count it again. I know strings are immutable, even when using methods such as .replace() (replace() method returns a copy of the string, no the actual string changed).

In order to mutate said string what I tend to do at the start is simply pass it on to a list with list(mystring) and then mutate that.

Question is... what is more efficient of the following? Take into account that option B gets done over and over, worst case scenario the strings are equal and have a match for match. Meanwhile option A happens once.

Option A)

list(mystring)

Option B)

mystring = mystring.replace(letterThatMatches, "")

willeM_ Van Onsem · Accepted Answer

The idea of calling replace on the string for each element you find, is simply not a good idea: it takes O(n) to do that. If you do that for every character in the other string, it will result in an O(m×n) algorithm with m the number of characters of the first string, and n the number of characters from the second string.

You can simply use two Counters, then calculate the minimum of the two, and then calculate the sum of the counts. Like:

from collections import Counter

def common_chars(s1,s2):
    c1 = Counter(s1) # count for every character in s1, the amount
    c2 = Counter(s2) # count for every character in s2, the amount
    c3 = c1 & c2 # produce the minimum of each character
    return sum(c3.values()) # sum up the resulting counts

Or as a one-liner:

def common_chars(s1,s2):
    return sum((Counter(s1) & Counter(s2)).values())

If dictionary lookup can be done in O(1) (which usually holds for an average case), this is an O(m+n) algorithm: the counting then happens in O(m) and O(n) respectively, and calculating the minimum runs in the number of different characters (which is at most O(max(m,n)). Finally taking the sum is again an O(max(m,n)) operation.

For the sample input, this results in:

>>> common_chars("aabcc","adcaa")
3

What is more efficient? Using .replace() or passing string to list

Answers (1)

Related Questions