eozzy
eozzy

Reputation: 68660

Regex for removing whitespace

def remove_whitespaces(value):
    "Remove all whitespaces"
    p = re.compile(r'\s+')
    return p.sub(' ', value)

The above code strips tags but doesn't remove "all" whitespaces from the value.

Thanks

Upvotes: 1

Views: 2094

Answers (6)

abhiomkar
abhiomkar

Reputation: 5028

re.sub('\s*', '', value) should also work!

Upvotes: 1

ghostdog74
ghostdog74

Reputation: 342333

@OP, compile your regex pattern outside, so you don't have to call re.compile everytime you use the procedure. Also you are substituting back to one space, that is not removing spaces is it?

p = re.compile(r'\s+')
def remove_whitespaces(p,value):
    "Remove all whitespaces"    
    return p.sub('', value)

lastly, another method not using regex is to just split on whitespaces and joining them up again

def remove_whitespaces(value):
    "Remove all whitespaces"    
    return ''.join(value.split())

Upvotes: 0

Tomasz Zieliński
Tomasz Zieliński

Reputation: 16346

Maybe value.join(p.split()) ''.join(value.split()) could work for you?

Upvotes: 1

Alex Martelli
Alex Martelli

Reputation: 881595

The fastest general approach eschews REs in favor of string's fast, powerful .translate method:

import string
identity = string.maketrans('', '')

def remove_whitespace(value):
  return value.translate(identity, string.whitespace)

In 2.6, it's even simpler, just

  return value.translate(None, string.whitespace)

Note that this applies to "plain" Python 2.* strings, i.e., bytestrings -- Unicode's strings' .translate method is somewhat different -- it takes a single argument which must be a mapping of ord values for Unicode characters to Unicode strings, or None for deletion. I.e., taking advantage of dict's handy .fromkeys classmethod:

nospace = dict.fromkeys(ord(c) for c in string.whitespace)

def unicode_remove_whitespace(value):
  return value.translate(nospace)

to remove exactly the same set of characters. Of course, Unicode also has more characters you could consider whitespace and want to remove -- so you'd probably want to build a mapping unicode_nospace based on information from module unicodedata, rather than using this simpler approach based on module string.

Upvotes: 6

eozzy
eozzy

Reputation: 68660

re.sub(r'\s', '', value) function works well for me, in this case.

Upvotes: 0

danben
danben

Reputation: 83250

p.sub(' ', value)

should be

p.sub('', value)

The former replaces all whitespace with a single space, the latter replaces with nothing.

Upvotes: 3

Related Questions