Reg ex remove non alpha characters keeping spaces

Question

I've written a simple function that strips a string of all non-alpha characters keeping spaces in place.

Currently it relies on using two regular expressions. However, in in interest of brevity I'd like to reduce those two reg exs into one. Is this possible?

import re

def junk_to_alpha(s):
  reg = r"[^A-Za-z]"
  p = re.compile(reg)
  s = re.sub(p, " ", s)
  p = re.compile(r"\s+")
  s = re.sub(p, " ", s)
  return s

print junk_to_alpha("Spoons! 12? \/@# ,.1 12 Yeah? {[]}")

# Spoons Yeah

Wiktor Stribiżew · Accepted Answer

You may enclose the [^a-zA-Z]+ with \s*:

import re

def junk_to_alpha(s):
  s = re.sub(r"\s*[^A-Za-z]+\s*", " ", s)
  return s

print junk_to_alpha("Spoons! 12? \/@# ,.1 12 Yeah? {[]}")

See the online Python demo

The pattern details:

\s* - zero or more whitespaces
[^A-Za-z]+ - 1 or more characters other than ASCII letters
\s* - see above.

Reg ex remove non alpha characters keeping spaces

Answers (1)

Related Questions