Keef Baker
Keef Baker

Reputation: 641

regular expression whitespace

I've been trying to teach myself regular expressions as I've mostly somehow avoided it so far.

However I have a puzzler.

Here's the code.

import re

listostuff = [ "crustybread", "rusty nail", "grust0", "superrust"]

for item in listostuff:
    result = True if re.match(r'[a-z]+rust[a-z0-9\s \t\s+]+', item) else False
    print item, result

and here's the result:

crustybread True
rusty nail False
grust0 True
superrust False

I expect superrust not to match, but I would expect "rusty nail" to match this.

I've put every whitespace character I can find in the re set but it doesn't pick it up. I've also tried combinations with just single ones. They don't seem to match rusty nail.

Can someone tell me what i'm doing wrong? (incidentally i have searched this site and the whitespace characters appear to be the ones I have here.

So my goal is to have all match true except superrust.

Upvotes: 1

Views: 1077

Answers (2)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626748

You need to make sure the pattern allows matching 0 or more letters at the beginning, replace [a-z]+ with [a-z]*:

re.match(r'[a-z]*rust[a-z0-9\s]+', item)
#               ^

Note that re.match only anchors the match at the start of the string, add $ at the end of the pattern if you want the whole input string to match your pattern.

See the regex demo.

Upvotes: 1

Raymond Hettinger
Raymond Hettinger

Reputation: 226256

The issue is that \s+ won't do what you want inside the [].

You will need something like rust[a-z0-9]+\s+[a-z0-9]+ which will make the space required instead of optional.

Upvotes: 0

Related Questions