Reputation: 249
I'm working on a problem that takes in a string and outputs the compressed string with the outputs of repeated characters. For example: aaabbcdddd
would return a3b2c1b4
. It would also have to account for special cases like the same character occurring later in the string like abcccccaaad
would need to return a1b1c5a3d1
instead of a4b1c5d1
. I have implemented two methods using list comprehension and building a list of strings then joining it.
Based on the info from: Efficient String Concatenation, my list comprehension function should be faster than building a list of strings and then joining it. Here are my two separate functions:
Using list comprehension:
counter = 1 # global count of repeating characters
def getCount(str1, char, ind):
global counter
returnStr = ''
if char == str1[ind - 1]: # if current char is equal previous char, inc counter
counter += 1
else:
returnStr = f"{counter}{char}" # else return current count and new char
counter = 1 # reset counter
if ind == len(str1) - 1:
returnStr += f"{counter}" # if at the end of str, append counter to returnStr
return returnStr
def stringCompression(str1):
# getCount() for indexes above 0 to be able to compare to previous characters
newStr = ''.join([getCount(str1, char, ind) if ind > 0 else char for ind, char in enumerate(str1)])
return newStr
Building list of strings then joining:
def stringCompression(str1):
strArr = []
counter = 0
for char in str1:
if not len(strArr):
strArr.append(char) # append first char to empty strArr
elif char != strArr[-1]:
# if char doesn't equal previous char, append counter and new char then reset counter
strArr.append(str(counter))
strArr.append(char)
counter = 0
counter += 1 # inc counter every pass
strArr.append(str(counter)) # append final count of latest char
return ''.join(strArr)
I am using this code to time each method:
start = time.time()
stringCompression('aabcccccaaaq'*100000)
end = time.time()
print("stringCompression = ", end - start)
And my list comprehension method is almost twice as slow as my other method. I am wondering why this is the case. Is it because I enumerate
my list, which I need to do to keep track of my current index, or because my getCount()
function is inefficient, or some other reason that I am not aware of?
Upvotes: 0
Views: 92
Reputation: 42778
With the list-comprehension, three if-conditions, one function call and global
.
The second version only uses two if-conditions.
You can reduce to one if-condition inside the loop, to get even faster results:
def string_compression(string):
string = iter(string)
char = next(string)
result = [char]
counter = 1
for new_char in string:
if new_char != char:
# if char doesn't equal previous char, append counter and new char then reset counter
result.append(str(counter))
result.append(new_char)
counter = 0
char = new_char
counter += 1 # inc counter every pass
result.append(str(counter)) # append final count of latest char
return ''.join(result)
Upvotes: 1
Reputation: 532418
Calling a user-defined function and accessing a global variable are both relatively expensive. Your second version effectively inlines getCount
, greatly reducing the overhead in breaking up the input string.
Upvotes: 1