standard_error
standard_error

Reputation: 191

Splitting string with optional year

I am trying to use RegEx in Python to split a string that starts with anything and may or may not end with a year in parentheses into two groups, where the first groups should contain everything but the year, and the second should contain only the year, or nothing if there is no year.

This is what I have so far:

string1 = 'First string'
string2 = 'Second string (2013)'

p = re.compile('(.*)\s*(?:\((\d{4,4})\))?')

print(p.match(string1).groups())
print(p.match(string2).groups())

which code returns this:

('First string', None)
('Second string (2013)', None)

But I'm trying to get this:

('First string', None)
('Second string', '2013')

I realize that the first part in my RegEx is greedy, but I can't find a way to make it not greedy without matching nothing. Also, the first part of my string can contain more or less anything (including parentheses and numbers).

I realize there are ways I can work around this, but since I'm trying to learn RegEx I'd prefer a RegEx solution.

Upvotes: 0

Views: 38

Answers (2)

ArtOfWarfare
ArtOfWarfare

Reputation: 21478

Here's a simple method that does what you want:

def extractYear(s):
    if len(s) >= 6 and s[-6] == '(' and s[-5:-1].isdigit() and s[-1] == ')':
        return s[:-6], s[-6:]
    return s, None

No regex needed. Just check if it ends with a four digit number wrapped in parenthesis or not. If it does, return the two substrings with the proper split. If it doesn't, return the entire string and None.

Alternatively, if you insist on using regex, you could do something more like:

def extractYear(s):
    if len(s) >= 6:
        year = s[-6:]
        p = re.compile('\(\d{4,4}\)')
        if p.match(year):
            return s[:-6], s[-6:]
    return s, None

The pattern is checking for a year wrapped in parenthesis. It doesn't care about everything else - we're just giving it the year substring to see if it matches or not.

Upvotes: 1

Raunak Agarwal
Raunak Agarwal

Reputation: 7228

Try this: (.*)\s*(?:\((\d{4,4})\))

>>> string2 = "Second String (2013)"
>>> p = re.compile("(.*)\s*(?:\((\d{4,4})\))")
>>> p.match(string2).groups()
('Second String ', '2013')

Upvotes: 0

Related Questions