user2758991
user2758991

Reputation: 7

Replacing all regular expression matches except a certain character

I would like to please know how I can replace all regular expressions matches except for a certain chosen character.

I need to clean data. An example of the data is

`some-really,dirty.data%#$_.`

which I would like to look like

some-reallydirtydata_

Note the - between some and really. That is my chosen character that I would not like to remove.

Here is a snippet of my code:

import re

unclean_string = "some-really,dirty.data%#$_."
clean_string = re.sub('\W', '', unclean_string)

print clean_string
>>>"somereallydirtydata_"

I know that \W removes all but "0 to 9, a to z, A to Z, and underscore".

I want to know how I can remove all of that, plus a chosen character (such as -).

Disclaimer: I apologise in advance for asking such a basic question. I am new to Python and using regex.

Upvotes: 0

Views: 79

Answers (2)

Sabuj Hassan
Sabuj Hassan

Reputation: 39365

Include hyphen with your regex:

clean_string = re.sub('[^-\w]', '', unclean_string)

Explanation of the regex:

NODE                     EXPLANATION
--------------------------------------------------------------------------------
  [^-\w]                   any character except: '-', word characters
                           (a-z, A-Z, 0-9, _)

Upvotes: 0

Jerry
Jerry

Reputation: 71538

You can use:

clean_string = re.sub(r'[^\w-]', '', unclean_string)

[^\w] is the equivalent of \W. So, if you add a - in there, you will not match it either.

Note: I also rawed the regex string above because it's a good practice to do so. This prevents unexpected behaviour you might have especially during escaping.

Upvotes: 2

Related Questions