Kalanamith
Kalanamith

Reputation: 20678

Remove all whitespace in a string

I want to eliminate all the whitespace from a string, on both ends, and in between words.

I have this Python code:

def my_handle(self):
    sentence = ' hello  apple  '
    sentence.strip()

But that only eliminates the whitespace on both sides of the string. How do I remove all whitespace?

Upvotes: 1269

Views: 2982952

Answers (16)

Cédric Julien
Cédric Julien

Reputation: 80831

If you want to remove leading and ending whitespace, use str.strip():

>>> "  hello  apple  ".strip()
'hello  apple'

If you want to remove all space characters, use str.replace() (NB this only removes the “normal” ASCII space character ' ' U+0020 but not any other whitespace):

>>> "  hello  apple  ".replace(" ", "")
'helloapple'

If you want to remove all whitespace and then leave a single space character between words, use str.split() followed by str.join():

>>> " ".join("  hello  apple  ".split())
'hello apple'

If you want to remove all whitespace then change the above leading " " to "":

>>> "".join("  hello  apple  ".split())
'helloapple'

Upvotes: 2398

Emil Stenström
Emil Stenström

Reputation: 14106

An alternative is to use regular expressions and match these strange white-space characters too. Here are some examples:

Remove ALL whitespace in a string, even between words:

import re
sentence = re.sub(r"\s+", "", sentence, flags=re.UNICODE)

Remove whitespace in the BEGINNING of a string:

import re
sentence = re.sub(r"^\s+", "", sentence, flags=re.UNICODE)

Remove whitespace in the END of a string:

import re
sentence = re.sub(r"\s+$", "", sentence, flags=re.UNICODE)

Remove whitespace both at the BEGINNING and at the END of a string:

import re
sentence = re.sub("^\s+|\s+$", "", sentence, flags=re.UNICODE)

Remove ONLY DUPLICATE whitespace:

import re
sentence = " ".join(re.split("\s+", sentence, flags=re.UNICODE))

(All examples work in both Python 2 and Python 3)

Upvotes: 167

James Bond
James Bond

Reputation: 3006

Just addition to the Emil Stenström's answer

This code removes all white spaces and you could also remove your own extra utf-8 characters.

import re

def utf8trim(s: str) -> str:
    spaces = "|".join([r"\s", "\u2800", "\u3164", "\u1160", "\uFFA0", "\u202c"])
    return re.sub(f"^[{spaces}]+|[{spaces}]+$", "", s, flags=re.UNICODE)

Upvotes: 0

cottontail
cottontail

Reputation: 23331

All string characters are unicode literal in Python 3; as a consequence, since str.split() splits on all white space characters, that means it splits on unicode white space characters. So split + join syntax (as in 1, 2, 3) will produce the same output as re.sub with the UNICODE flag (as in 4); in fact, the UNICODE flag is redundant here (as in 2, 5, 6, 7).

import re
import sys

# all unicode characters
sentence = ''.join(map(chr, range(sys.maxunicode+1)))

# remove all white space characters
x = ''.join(sentence.split())
y = re.sub(r"\s+", "", sentence, flags=re.UNICODE)
z = re.sub(r"\s+", "", sentence)

x == y == z      # True

In terms of performance, since Python's string methods are optimized, they are much faster than regex. As the following timeit test shows, when removing all white space characters from the string in the OP, Python string methods are over 7 times faster than re option.

import timeit

import timeit

setup = """
import re
s = ' hello  \t apple  '
"""

t1 = min(timeit.repeat("''.join(s.split())", setup))
t2 = min(timeit.repeat("re.sub(r'\s+', '', s, flags=re.UNICODE)", setup))


t2 / t1  # 7.868004799367726

Upvotes: 2

user856387
user856387

Reputation: 91

I found that this works the best for me:

test_string = '  test   a   s   test '
string_list = [s.strip() for s in str(test_string).split()]
final_string = ' '.join(string_array)
# final_string: 'test a s test'

It removes any whitespaces, tabs, etc.

Upvotes: 1

Jane Kathambi
Jane Kathambi

Reputation: 945

In the following script we import the regular expression module which we use to substitute one space or more with a single space. This ensures that the inner extra spaces are removed. Then we use strip() function to remove leading and trailing spaces.

# Import regular expression module
import re

# Initialize string
a = "     foo      bar   "

# First replace any number of spaces with a single space
a = re.sub(' +', ' ', a)

# Then strip any leading and trailing spaces.
a = a.strip()

# Show results
print(a)

Upvotes: 3

naoki fujita
naoki fujita

Reputation: 719

I use split() to ignore all whitespaces and use join() to concatenate strings.

sentence = ''.join(' hello  apple  '.split())
print(sentence) #=> 'helloapple'

I prefer this approach because it is only a expression (not a statement).
It is easy to use and it can use without binding to a variable.

print(''.join(' hello  apple  '.split())) # no need to binding to a variable

Upvotes: 6

Assad Ali
Assad Ali

Reputation: 288

try this.. instead of using re i think using split with strip is much better

def my_handle(self):
    sentence = ' hello  apple  '
    ' '.join(x.strip() for x in sentence.split())
#hello apple
    ''.join(x.strip() for x in sentence.split())
#helloapple

Upvotes: -2

MaK
MaK

Reputation: 1728

"Whitespace" includes space, tabs, and CRLF. So an elegant and one-liner string function we can use is str.translate:

Python 3

' hello  apple '.translate(str.maketrans('', '', ' \n\t\r'))

OR if you want to be thorough:

import string
' hello  apple'.translate(str.maketrans('', '', string.whitespace))

Python 2

' hello  apple'.translate(None, ' \n\t\r')

OR if you want to be thorough:

import string
' hello  apple'.translate(None, string.whitespace)

Upvotes: 66

handle
handle

Reputation: 6359

eliminate all the whitespace from a string, on both ends, and in between words.

>>> import re
>>> re.sub("\s+", # one or more repetition of whitespace
    '', # replace with empty string (->remove)
    ''' hello
...    apple
... ''')
'helloapple'

Python docs:

Upvotes: 6

Amnon Harel
Amnon Harel

Reputation: 171

' hello  \n\tapple'.translate({ord(c):None for c in ' \n\t\r'})

MaK already pointed out the "translate" method above. And this variation works with Python 3 (see this Q&A).

Upvotes: 13

cacti5
cacti5

Reputation: 2106

In addition, strip has some variations:

Remove spaces in the BEGINNING and END of a string:

sentence= sentence.strip()

Remove spaces in the BEGINNING of a string:

sentence = sentence.lstrip()

Remove spaces in the END of a string:

sentence= sentence.rstrip()

All three string functions strip lstrip, and rstrip can take parameters of the string to strip, with the default being all white space. This can be helpful when you are working with something particular, for example, you could remove only spaces but not newlines:

" 1. Step 1\n".strip(" ")

Or you could remove extra commas when reading in a string list:

"1,2,3,".strip(",")

Upvotes: 11

yan bellavance
yan bellavance

Reputation: 4840

Be careful:

strip does a rstrip and lstrip (removes leading and trailing spaces, tabs, returns and form feeds, but it does not remove them in the middle of the string).

If you only replace spaces and tabs you can end up with hidden CRLFs that appear to match what you are looking for, but are not the same.

Upvotes: 7

PrabhuPrakash
PrabhuPrakash

Reputation: 261

import re    
sentence = ' hello  apple'
re.sub(' ','',sentence) #helloworld (remove all spaces)
re.sub('  ',' ',sentence) #hello world (remove double spaces)

Upvotes: 3

Mark Byers
Mark Byers

Reputation: 838974

To remove only spaces use str.replace:

sentence = sentence.replace(' ', '')

To remove all whitespace characters (space, tab, newline, and so on) you can use split then join:

sentence = ''.join(sentence.split())

or a regular expression:

import re
pattern = re.compile(r'\s+')
sentence = re.sub(pattern, '', sentence)

If you want to only remove whitespace from the beginning and end you can use strip:

sentence = sentence.strip()

You can also use lstrip to remove whitespace only from the beginning of the string, and rstrip to remove whitespace from the end of the string.

Upvotes: 450

wal-o-mat
wal-o-mat

Reputation: 7344

For removing whitespace from beginning and end, use strip.

>> "  foo bar   ".strip()
"foo bar"

Upvotes: 19

Related Questions