user984003
user984003

Reputation: 29527

Python fastest way to remove multiple spaces in a string

This question has been asked before, but the fast answers that I have seen also remove the trailing spaces, which I don't want.

"   a     bc    "

should become

" a bc "

I have

text = re.sub(' +', " ", text)

but am hoping for something faster. The suggestion that I have seen (and which won't work) is

' '.join(text.split())

Note that I will be doing this to lots of smaller texts so just checking for a trailing space won't be so great.

Upvotes: 1

Views: 4571

Answers (3)

Has QUIT--Anony-Mousse
Has QUIT--Anony-Mousse

Reputation: 77454

If you want to really optimize stuff like this, use C, not python.

Try cython, that is pretty much Python syntax but fast as C.

Here is some stuff you can time:

import array
buf=array.array('c')
input="   a     bc    "
space=False
for c in input:
  if not space or not c == ' ': buf.append(c)
  space = (c == ' ')
buf.tostring()

Also try using cStringIO:

import cStringIO
buf=cStringIO.StringIO()
input="   a     bc    "
space=False
for c in input:
  if not space or not c == ' ': buf.write(c)
  space = (c == ' ')
buf.getvalue()

But again, if you want to make such things really fast, don't do it in python. Use cython. The two approaches I gave here will likely be slower, just because they put much more work on the python interpreter. If you want these things to be fast, do as little as possible in python. The for c in input loop likely already kills all theoretical performance of above approaches.

Upvotes: 2

Fredrik Pihl
Fredrik Pihl

Reputation: 45644

FWIW, some timings

$  python -m timeit -s 's="   a     bc    "' 't=s[:]' "while '  ' in t: t=t.replace('  ', ' ')"
1000000 loops, best of 3: 1.05 usec per loop

$ python -m timeit -s 'import re;s="   a     bc    "'  "re.sub(' +', ' ', s)"
100000 loops, best of 3: 2.27 usec per loop

$ python -m timeit -s 's=" a bc "' "''.join((s[0],' '.join(s[1:-1].split()),s[-1]))"
1000000 loops, best of 3: 0.592 usec per loop

$ python -m timeit -s 'import re;s="   a     bc    "'  "re.sub(' {2,}', ' ', s)"
100000 loops, best of 3: 2.34 usec per loop

$ python -m timeit -s 's="   a     bc    "' '" "+" ".join(s.split())+" "'
1000000 loops, best of 3: 0.387 usec per loop

Upvotes: 3

Slater Victoroff
Slater Victoroff

Reputation: 21914

Just a small rewrite of the suggestion up there, but just because something has a small fault doesn't mean you should assume it won't work.

You could easily do something like:

front_space = lambda x:x[0]==" "
trailing_space = lambda x:x[-1]==" "
" "*front_space(text)+' '.join(text.split())+" "*trailing_space(text)

Upvotes: 0

Related Questions