Reputation: 4008
From a string I need to remove everything that is not a letter, number, space, or '-'.
I use:
regex = re.compile('^[,?!`@#$%^&*()+=.:/]+')
name = regex.sub('', my_text)
But if I have the text:
lorem ipsum: 100 gb/s and beyond
My regex expression from the example above, doesn't remove ':'
, '/'
Upvotes: 0
Views: 41
Reputation:
You need to remove the ^ (start of line) On a side note the the + is not mandatory
regex = re.compile('[,?!`@#$%^&*()+=.:/]')
name = regex.sub('', my_text)
Demo: https://regex101.com/r/DjTvwL/1
I re-read your description and since you do not want 'everything but letter, digits and space' your current regex does not fit : it let [ _ " and so on... so you better use a negative regex :
import re
my_regex = re.compile('([^0-9A-Za-z\-\s])') # 0- 9 => digits; A-z => letter; \- the '-' char; \s any whitespace
my_text = 'lorem ipsum: 100 gb/s and beyond'
name = my_regex.sub('', my_text)
print(name)
Upvotes: 3
Reputation: 114290
Rather than try to capture all possible symbols, dingbats, and whatever other characters you want to remove, I would recommend implementing "everything that is not, letter, number, space, or '-'" literally:
regex = re.compile('[^a-zA-Z0-9 -]')
name = regex.sub('', my_text)
You can use character classes in your character class. If you are OK with underscores being a letter, and you want to support Unicode letters, the following is more concise:
regex = re.compile('[^\w -]')
The problem with your original expression is that ^
outside a character class matches the start of a line. Your expression can only remove the characters you specified from the beginning of the string.
Upvotes: 1