Reputation: 455
I'm writing a list of strings as tab delimited file, using python 3.6
It is rare, but hypothetically possible, that there are tabs in the data. If so, I need to replace them with spaces. Which I do like this :
row = [x.replace("\t", " ") for x in row]
The trouble is, this one line is responsible for about 1/4 of the runtime of the whole program, and it almost never is actually doing anything.
Is there a faster way to purge tabs from my data?
Is there any way to take advantage of the fact that it probably doesn't have any tabs anyway?
I've tried working in bytes instead of strings, and that made no difference.
Upvotes: 0
Views: 61
Reputation: 42143
I tried various approaches and the fastest one is to perform a conditional replacement at indexes where a tab is present
def testReplace(sList):
return [s.replace("\t"," ") for s in sList]
noTabs = str.maketrans("\t"," ")
def testTrans(sList):
return [s.translate(noTabs) for s in sList]
def joinSplit(sList):
return "\n".join(sList).replace("\t"," ").split("\n")
def conditional(sList):
result = sList.copy() # not needed if you intend to replace the list
for i,s in enumerate(sList):
if "\t" in s:
result[i] = s.replace("\t"," ")
return result
performance checks:
from timeit import timeit
count = 100
strings = ["Hello World"*10]*1000 # ["Hello \t World"*10]*1000
t = timeit(lambda:testReplace(strings),number=count)
print("replace",t)
t = timeit(lambda:testTrans(strings),number=count)
print("translate",t)
t = timeit(lambda:joinSplit(strings),number=count)
print("joinSplit",t)
t = timeit(lambda:conditional(strings),number=count)
print("conditional",t)
output:
# With tabs
replace 0.03365320100000002
translate 0.08165113099999993
joinSplit 0.027709890000000015
conditional 0.007067911000000038
# without tabs
replace 0.015160736000000008
translate 0.07439537500000004
joinSplit 0.017001820000000056
conditional 0.0065534649999999806
Upvotes: 3
Reputation: 148965
Untested on a performance question, but I would use the csv module that knows about fields containing new lines or separators, and automatically quotes them:
import csv
with open(filename, 'w', newline='') ad fd:
wr = csv.writer(fd, delimiter='\t')
...
wr.writerow(row)
Upvotes: 1