Mr Mystery Guest
Mr Mystery Guest

Reputation: 1474

Reg ex remove non alpha characters keeping spaces

I've written a simple function that strips a string of all non-alpha characters keeping spaces in place.

Currently it relies on using two regular expressions. However, in in interest of brevity I'd like to reduce those two reg exs into one. Is this possible?

import re

def junk_to_alpha(s):
  reg = r"[^A-Za-z]"
  p = re.compile(reg)
  s = re.sub(p, " ", s)
  p = re.compile(r"\s+")
  s = re.sub(p, " ", s)
  return s

print junk_to_alpha("Spoons! 12? \/@# ,.1 12 Yeah? {[]}")

# Spoons Yeah

Upvotes: 3

Views: 1745

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626870

You may enclose the [^a-zA-Z]+ with \s*:

import re

def junk_to_alpha(s):
  s = re.sub(r"\s*[^A-Za-z]+\s*", " ", s)
  return s

print junk_to_alpha("Spoons! 12? \/@# ,.1 12 Yeah? {[]}")

See the online Python demo

The pattern details:

  • \s* - zero or more whitespaces
  • [^A-Za-z]+ - 1 or more characters other than ASCII letters
  • \s* - see above.

Upvotes: 4

Related Questions