Reputation: 1152
I have doubt on python regex operation. Here you go my sample test.
>>>re.match(r'(\w+)','a-b') gives an output
>>> <_sre.SRE_Match object at 0x7f51c0033210>
>>>re.match(r'(\w+):(\d+)','a-b:1')
>>>
Why does the 2nd regex condition doesn't give match object though the 1st regex gives match object for a normal string match condition, irrespective of special characters is available in the string?
However, \w+ will matches for [a-z,A-Z,_]. I'm not clear why (\w+) gives matched object for the string 'a-b'. How can I check whether the given string doesn't contain any special characters?
Upvotes: 2
Views: 4394
Reputation: 239453
Match's docs say
If zero or more characters at the beginning of string match the regular expression pattern, return a corresponding MatchObject instance.
match
method will return the matched object if it finds a match at the beginning of the string. (\w+)
matches a
in a-b
.
print re.match(r'(\w+)','a-b').group()
will print
a
In the second case ((\w+):(\d+)
), the actual string which gets matched is b:1
, which is not at the beginning of the string. That's why its returning None
.
How can I check whether the given string doesn't contain any special characters?
I would say, the second regular expression which you have used should be enough and match
function should be enough. I insist on match
, since there are differences between match
and search
http://docs.python.org/2.7/library/re.html#search-vs-match
Remember, you
Upvotes: 1
Reputation: 387587
Taking a look at the actual match will give you an idea of what happens.
>>> re.match(r'(\w+)', 'a-b')
<_sre.SRE_Match object at 0x0000000002DE45D0>
>>> _.groups()
('a',)
As you can see, the expression matched a
. The character sequence \w
only contains actual word characters, but not separators like dashes. So you can’t actually match a-b
using just a \w+
.
Now in the second expression one might think that it would match b:1
at least, given that \w+
matches b
and :(\d+)
does match the 1
. However it does not happen due to how re.match
works. As the documentation hints, it only tries to match “at the beginning of string
”. So when using re.match
there is an implicit ^
at the beginning of the expression that makes it only match from the start. So it actually tries to find a match starting with a
.
Instead, you can use re.search
which actually looks in the whole string if it can match the expression anywhere. So there, you will get a result:
>>> re.search(r'(\w+):(\d+)', 'a-b:1')
<_sre.SRE_Match object at 0x0000000002E01B58>
>>> _.groups()
('b', '1')
For further information on the search
vs. match
topic, check this section in the manual.
And finally, if you want to match dashes too, you can use a character sequence [\w-]
for example:
>>> re.match(r'([\w-]+):(\d+)', 'a-b:1')
<_sre.SRE_Match object at 0x0000000002E01B58>
>>> _.groups()
('a-b', '1')
Upvotes: 6
Reputation: 142136
The first matches the a
- one or more word chars.
The second is one or more word chars immediately followed by a :
which there aren't...
[a-z,A-Z,_]
(the equivalent of \w
) means a to z and A to Z - it isn't the literal hyphen in this context, if you did want a hyphen, put it as the first or last character of a character class.
Upvotes: 2