user3541631
user3541631

Reputation: 4008

Regex expresion issue removing ':/'

From a string I need to remove everything that is not a letter, number, space, or '-'.

I use:

regex = re.compile('^[,?!`@#$%^&*()+=.:/]+')
name = regex.sub('', my_text)

But if I have the text:

lorem ipsum: 100 gb/s and beyond

My regex expression from the example above, doesn't remove ':', '/'

Upvotes: 0

Views: 41

Answers (2)

user1531591
user1531591

Reputation:

You need to remove the ^ (start of line) On a side note the the + is not mandatory

regex = re.compile('[,?!`@#$%^&*()+=.:/]')
name = regex.sub('', my_text)

Demo: https://regex101.com/r/DjTvwL/1

I re-read your description and since you do not want 'everything but letter, digits and space' your current regex does not fit : it let [ _ " and so on... so you better use a negative regex :

import re
my_regex = re.compile('([^0-9A-Za-z\-\s])') # 0- 9 => digits; A-z => letter; \- the '-' char; \s any whitespace
my_text = 'lorem ipsum: 100 gb/s and beyond'

name = my_regex.sub('', my_text)

print(name)

Upvotes: 3

Mad Physicist
Mad Physicist

Reputation: 114290

Rather than try to capture all possible symbols, dingbats, and whatever other characters you want to remove, I would recommend implementing "everything that is not, letter, number, space, or '-'" literally:

regex = re.compile('[^a-zA-Z0-9 -]')
name = regex.sub('', my_text)

You can use character classes in your character class. If you are OK with underscores being a letter, and you want to support Unicode letters, the following is more concise:

 regex = re.compile('[^\w -]')

The problem with your original expression is that ^ outside a character class matches the start of a line. Your expression can only remove the characters you specified from the beginning of the string.

Upvotes: 1

Related Questions