Reputation: 641
I've been trying to teach myself regular expressions as I've mostly somehow avoided it so far.
However I have a puzzler.
Here's the code.
import re
listostuff = [ "crustybread", "rusty nail", "grust0", "superrust"]
for item in listostuff:
result = True if re.match(r'[a-z]+rust[a-z0-9\s \t\s+]+', item) else False
print item, result
and here's the result:
crustybread True
rusty nail False
grust0 True
superrust False
I expect superrust not to match, but I would expect "rusty nail" to match this.
I've put every whitespace character I can find in the re set but it doesn't pick it up. I've also tried combinations with just single ones. They don't seem to match rusty nail.
Can someone tell me what i'm doing wrong? (incidentally i have searched this site and the whitespace characters appear to be the ones I have here.
So my goal is to have all match true except superrust.
Upvotes: 1
Views: 1077
Reputation: 626748
You need to make sure the pattern allows matching 0 or more letters at the beginning, replace [a-z]+
with [a-z]*
:
re.match(r'[a-z]*rust[a-z0-9\s]+', item)
# ^
Note that re.match
only anchors the match at the start of the string, add $
at the end of the pattern if you want the whole input string to match your pattern.
See the regex demo.
Upvotes: 1
Reputation: 226256
The issue is that \s+
won't do what you want inside the []
.
You will need something like rust[a-z0-9]+\s+[a-z0-9]+
which will make the space required instead of optional.
Upvotes: 0