Reputation: 8782
I want my string to only have alphanumeric characters, -, and underscores. Thats it. I am trying to write a method that takes in a user input string and converts it so that it follows the guideline.
My regex is obviously a-zA-Z0-9_-
. What I want to do is replace all the spaces with the -, and just remove all the other characters that don't fall under my regex.
So, the string 'Hello, world!'
would get converted into 'Hello-world'
. The special characters get removed, and the space is replaced with a -.
What would be the most efficient way to do this using python? Do I have to iterate over the entire string character by character, or is there a better way? Thanks!
Upvotes: 1
Views: 534
Reputation: 1576
What you want is also often used when generating URL names for content. It is implemented in django.utils.text.slugify. The slugify function converts to lowercase though. Here is a simplified version of Djangos slugify function that preserves case:
import re
def slugify(value):
value = re.sub('[^A-Za-z_\s-]', '', value, flags=re.U).strip()
return re.sub('[-\s]+', '-', value, flags=re.U)
print(slugify("Hello World!"))
# Hello-World
Upvotes: 1
Reputation: 215127
You can do it with two sub
s: 1) replace spaces with -
; 2) remove other unwanted characters:
s = 'Hello, world!'
import re
re.sub("[^a-zA-Z_-]", "", re.sub("\s+", "-", s))
# 'Hello-world'
If you want to keep digits in your string:
re.sub("[^a-zA-Z0-9_-]", "", re.sub("\s+", "-", s))
# 'Hello-world'
Here [^a-zA-Z_-]
matches a single character that is not a letter(upper and lower case), underscore and dash, the dash needs to be placed at the end of the character class []
so that it won't be treated as range but literal.
Upvotes: 3