F. Serna
F. Serna

Reputation: 249

Why is my list comprehension function slower than my list appending function for string concatenation?

I'm working on a problem that takes in a string and outputs the compressed string with the outputs of repeated characters. For example: aaabbcdddd would return a3b2c1b4. It would also have to account for special cases like the same character occurring later in the string like abcccccaaad would need to return a1b1c5a3d1 instead of a4b1c5d1. I have implemented two methods using list comprehension and building a list of strings then joining it.

Based on the info from: Efficient String Concatenation, my list comprehension function should be faster than building a list of strings and then joining it. Here are my two separate functions:

Using list comprehension:

counter = 1 # global count of repeating characters

def getCount(str1, char, ind):
    global counter
    returnStr = ''

    if char == str1[ind - 1]: # if current char is equal previous char, inc counter
        counter += 1
    else:        
        returnStr = f"{counter}{char}" # else return current count and new char
        counter = 1 # reset counter

    if ind == len(str1) - 1:
        returnStr += f"{counter}" # if at the end of str, append counter to returnStr

    return returnStr

def stringCompression(str1):
    # getCount() for indexes above 0 to be able to compare to previous characters
    newStr = ''.join([getCount(str1, char, ind) if ind > 0 else char for ind, char in enumerate(str1)])
    return newStr

Building list of strings then joining:

def stringCompression(str1):
    strArr = []
    counter = 0

    for char in str1:
        if not len(strArr):
            strArr.append(char)  # append first char to empty strArr
        elif char != strArr[-1]:
            # if char doesn't equal previous char, append counter and new char then reset counter
            strArr.append(str(counter))
            strArr.append(char)
            counter = 0
        counter += 1 # inc counter every pass

    strArr.append(str(counter)) # append final count of latest char
    return ''.join(strArr)

I am using this code to time each method:

start = time.time()
stringCompression('aabcccccaaaq'*100000)
end = time.time()
print("stringCompression = ", end - start)

And my list comprehension method is almost twice as slow as my other method. I am wondering why this is the case. Is it because I enumerate my list, which I need to do to keep track of my current index, or because my getCount() function is inefficient, or some other reason that I am not aware of?

Upvotes: 0

Views: 92

Answers (2)

Daniel
Daniel

Reputation: 42778

With the list-comprehension, three if-conditions, one function call and global. The second version only uses two if-conditions.

You can reduce to one if-condition inside the loop, to get even faster results:

def string_compression(string):
    string = iter(string)
    char = next(string)
    result = [char]
    counter = 1
    for new_char in string:
        if new_char != char:
            # if char doesn't equal previous char, append counter and new char then reset counter
            result.append(str(counter))
            result.append(new_char)
            counter = 0
            char = new_char
        counter += 1 # inc counter every pass
    result.append(str(counter)) # append final count of latest char
    return ''.join(result)

Upvotes: 1

chepner
chepner

Reputation: 532418

Calling a user-defined function and accessing a global variable are both relatively expensive. Your second version effectively inlines getCount, greatly reducing the overhead in breaking up the input string.

Upvotes: 1

Related Questions