Reputation: 25928
I have a regular expression that grabs the suburb from a string that usually contains a suburb and industry in the format:
INDUSTRY - SUBURB
Sometimes the string may not contain the INDUSTRY -
part and just have the suburb. In this case my regular expression fails to grab anything.
Is there a way to make the regex robust enough to grab everything after the hypen if its present otherwise just grab everything?
The following regex doesn't work: (- |^)(.*)(,|$)
The result is: dvertising - Roseville Chase
Upvotes: 1
Views: 421
Reputation: 1044
Instead of using (.*)
, use ([^-]*)
:
(- |^)([^-]*)(,|$)
In action:
import re
re.search(r"(- |^)([^-]*)(,|$)", "Advertising - Roseville Chase").group(2)
Out[97]: 'Roseville Chase'
re.search(r"(- |^)([^-]*)(,|$)", "Roseville Chase").group(2)
Out[98]: 'Roseville Chase'
*More explanation was requested:
[^-] means "any character except for -". By using [^-], you are making it impossible for the regex to match the entire string if there is a hyphen present. It will have to match everything after the hyphen.
Upvotes: 2
Reputation: 142126
Well... it's much easier to do this not using a regex, I have to sit and grok the other answers and that's not what Python's about - I agree with Robert.
I'd just go for:
def suburb_or_all(text):
industry, hyphen_present, suburb = text.partition(' - ')
return suburb if hypen_present else text
Completely readable, self-documenting and remarkably efficient.
Upvotes: 1
Reputation: 3416
You could do this: (?<=-\s)(.*)
which would return everything after the -
. You can try it out here.
Upvotes: -1
Reputation: 1915
Have two groups: one for the industry plus hyphen, and one for the suburb. Make the industry group optional with a question mark.
pattern = re.compile(r"([^-]*-)?(.*)")
pattern.match("Advertising - Roseville Chase").group(2)
pattern.match("Amityville").group(2)
Upvotes: 3