Reputation: 3707
Solving the following problem from CodeFights:
Given two strings, find the number of common characters between them. For s1 = "aabcc" and s2 = "adcaa", the output should be commonCharacterCount(s1, s2) = 3.
Strings have 3 common characters - 2 "a"s and 1 "c".
The way I approached it, whenever I took a letter into account I wanted to cancel it out so as not to count it again. I know strings are immutable, even when using methods such as .replace()
(replace() method returns a copy of the string, no the actual string changed).
In order to mutate said string
what I tend to do at the start is simply pass it on to a list with list(mystring)
and then mutate that.
Question is... what is more efficient of the following? Take into account that option B gets done over and over, worst case scenario the strings are equal and have a match for match. Meanwhile option A happens once.
Option A)
list(mystring)
Option B)
mystring = mystring.replace(letterThatMatches, "")
Upvotes: 1
Views: 168
Reputation: 476719
The idea of calling replace
on the string for each element you find, is simply not a good idea: it takes O(n) to do that. If you do that for every character in the other string, it will result in an O(m×n) algorithm with m the number of characters of the first string, and n the number of characters from the second string.
You can simply use two Counter
s, then calculate the minimum of the two, and then calculate the sum of the counts. Like:
from collections import Counter
def common_chars(s1,s2):
c1 = Counter(s1) # count for every character in s1, the amount
c2 = Counter(s2) # count for every character in s2, the amount
c3 = c1 & c2 # produce the minimum of each character
return sum(c3.values()) # sum up the resulting counts
Or as a one-liner:
def common_chars(s1,s2):
return sum((Counter(s1) & Counter(s2)).values())
If dictionary lookup can be done in O(1) (which usually holds for an average case), this is an O(m+n) algorithm: the counting then happens in O(m) and O(n) respectively, and calculating the minimum runs in the number of different characters (which is at most O(max(m,n)). Finally taking the sum is again an O(max(m,n)) operation.
For the sample input, this results in:
>>> common_chars("aabcc","adcaa")
3
Upvotes: 7