Reputation: 12254
I have this function which is intended to take a string as input and replaces anything that isn't a letter, numeric digit, underscore or dash:
def clean_label_value(label_value):
"""
GCP Label values have to follow strict guidelines
Keys and values can only contain lowercase letters, numeric characters, underscores,
and dashes. International characters are allowed.
https://cloud.google.com/compute/docs/labeling-resources#restrictions
:param label_value: label value that needs to be cleaned up
:return: cleaned label value
"""
full_pattern = re.compile('[^a-zA-Z0-9]')
return re.sub(full_pattern, '_', label_value).lower()
I have this unit test, which succeeds
def test_clean_label_value(self):
self.assertEqual(clean_label_value('XYZ_@:.;\\/,'), 'xyz________')
however its replacing dashes, which I don't want it to. To demonstrate:
def clean_label_value(label_value):
full_pattern = re.compile('[^a-zA-Z0-9]|-')
return re.sub(full_pattern, '_', label_value).lower()
but this:
def test_clean_label_value(self):
self.assertEqual(clean_label_value('XYZ-'), 'xyz-')
then failed with
xyz- != xyz_
Expected :xyz_
Actual :xyz-
In other words, the -
is getting replaced with a _
. I don't want that to happen. I've fiddled around with the regex, trying all sorts of different combinations, but I can't figure the darned thing out. Anyone?
Upvotes: 0
Views: 339
Reputation: 23064
Put a single -
at the very beginning or end of the set (character class). Then it doesn't create a character range, but represents the literal -
character itself.
re.compile('[^-a-zA-Z0-9]')
It's also possible to escape the -
with a \
, to indicate that it's a literal dash character and not a range operator inside a set.
re.compile(r'[^\-\w]')
The special sequence \w
is equivalent to the set [a-zA-Z0-9_]
("w" for "word character").
Upvotes: 5