Reputation: 10879
I want to strip all special characters from a Python string, except dashes and spaces.
Is this correct?
import re
my_string = "Web's GReat thing-ok"
pattern = re.compile('[^A-Za-z0-9 -]')
new_string = pattern.sub('',my_string)
new_string
>> 'Webs GReat thing-ok'
# then make it lowercase and replace spaces with underscores
# new_string = new_string.lower().replace (" ", "_")
# new_string
# >> 'webs_great_thing-ok'
As shown, I ultimately want to replace the spaces with underscores after removing the other special characters, but figured I would do it in stages. Is there a Pythonic way to do it all in one fell swoop?
For context, I am using this input for MongoDB collection names, so want the constraint of the final string to be: alphanumeric with dashes and underscores allowed.
Upvotes: 1
Views: 16607
Reputation: 174662
A one-liner, as requested:
>>> import re, unicodedata
>>> value = "Web's GReat thing-ok"
>>> re.sub('[\s]+', '_', re.sub('[^\w\s-]', '', unicodedata.normalize('NFKD', unicode(value)).encode('ascii', 'ignore').decode('ascii')).strip().lower())
u'webs_great_thing-ok'
Upvotes: 1
Reputation: 81654
You are actually trying to "slugify" your string.
If you don't mind using a 3rd party (and a Python 2 specific) library you can use slugify
(pip install slugify
):
import slugify
string = "Web's GReat thing-ok"
print slugify.slugify(string)
>> 'webs_great_thing-ok'
You can implement it yourself.
All of slugify
's code is
import re
import unicodedata
def slugify(string):
return re.sub(r'[-\s]+', '-',
unicode(
re.sub(r'[^\w\s-]', '',
unicodedata.normalize('NFKD', string)
.encode('ascii', 'ignore'))
.strip()
.lower())
Note that this is Python 2 specific.
Going back to your example, You can make it a one-liner. Whether it is Pythonic enough is up to you to decide (note the shortened range A-z
instead of A-Za-z
):
import re
my_string = "Web's GReat thing-ok"
new_string = re.sub('[^A-z0-9 -]', '', my_string).lower().replace(" ", "_")
UPDATE There seems to be more robust and Python 3 compatible "slugify" library here.
Upvotes: 4