Reputation: 1033
i am using python with re module for regular expressions i want to remove all the characters from the string except numbers and characters. To achieve this i am using sub function
Code Snippet:-
>>> text="foo.bar"
>>> re.sub("[^A-Z][^a-z]","",text)
'fobar'
I wanted to know why above expression removes the "o."?
I am not able to understand why it removes the "o" Can someone please explain me what is going on behind this?
I know to correct solution of this problem is
>>> re.sub("[^A-Z ^a-z]","",text)
'foobar'
Thanks in advance
Upvotes: 1
Views: 111
Reputation: 23516
A very important aspect to realize is that [^A-Z][^a-z]
represents two characters (one for each character group), while [^A-Za-z]
represents only one.
Upvotes: 3
Reputation: 2958
The [^A-Z]
means all characters except uppercase A to Z, the second o
in foo.bar
is not uppercase so it matches as a matter of fact everything in foo.bar is matched at this point.
Then you add [^a-z]
so you look for a character that is not lowercase, only the dot matches.
Combine both and you look for a non-uppercase character followed by a non-lowercase character so this matches o.
The solution is the one proposed by Ignacio.
Upvotes: 2
Reputation: 798626
Because o
matches [^A-Z]
and .
matches [^a-z]
.
And the correct solution is [^A-Za-z0-9]
.
Upvotes: 1