Python performance with replace that seldom replaces

Question

I'm writing a list of strings as tab delimited file, using python 3.6

It is rare, but hypothetically possible, that there are tabs in the data. If so, I need to replace them with spaces. Which I do like this :

row = [x.replace("	", " ") for x in row]

The trouble is, this one line is responsible for about 1/4 of the runtime of the whole program, and it almost never is actually doing anything.

Is there a faster way to purge tabs from my data?

Is there any way to take advantage of the fact that it probably doesn't have any tabs anyway?

I've tried working in bytes instead of strings, and that made no difference.

Alain T. · Accepted Answer

I tried various approaches and the fastest one is to perform a conditional replacement at indexes where a tab is present

def testReplace(sList):
    return [s.replace("	"," ") for s in sList]

noTabs  = str.maketrans("	"," ")
def testTrans(sList):
    return [s.translate(noTabs) for s in sList]

def joinSplit(sList):
    return "
".join(sList).replace("	"," ").split("
")

def conditional(sList):
    result = sList.copy() # not needed if you intend to replace the list
    for i,s in enumerate(sList):
        if "	" in s:
            result[i] = s.replace("	"," ")
    return result

performance checks:

from timeit import timeit
count   = 100
strings = ["Hello World"*10]*1000  # ["Hello 	 World"*10]*1000

t = timeit(lambda:testReplace(strings),number=count)
print("replace",t)   

t = timeit(lambda:testTrans(strings),number=count)
print("translate",t) 

t = timeit(lambda:joinSplit(strings),number=count)
print("joinSplit",t) 

t = timeit(lambda:conditional(strings),number=count)
print("conditional",t)

output:

# With tabs
replace     0.03365320100000002
translate   0.08165113099999993
joinSplit   0.027709890000000015
conditional 0.007067911000000038

# without tabs
replace     0.015160736000000008
translate   0.07439537500000004
joinSplit   0.017001820000000056
conditional 0.0065534649999999806

Python performance with replace that seldom replaces

Answers (2)

Related Questions