Bobo
Bobo

Reputation: 9163

in python, how to match Strings based on Regular Expression and get the non-matching parts as a list?

For example: I have a string "abcde2011-09-30.log", I want to check if this string matchs "(\d){4}-(\d){2}-(\d){2}" ( dont think it has correct syntax, but you get the idea). And I need to split the string into 3 parts: (abcde),(e2011-09-30), (.log). How can I do it in python? Thanks.

Upvotes: 0

Views: 78

Answers (4)

Remi
Remi

Reputation: 21175

(without using regex and interpreting your string as a filename:)

lets start with splitting the filename and the extension 'log':

filename, ext = os.path.splitext('abcde2011-09-30.log')

most probably, the length of the date is allways 10, allowing for:

year, month, day = [int(i) for i in filename[-10:].split('-')]
description = filename[:-10]

However, if you are not sure we can find out where the date-part of the filename starts:

for i in range(len(filename)):
    if filename[i].isdigit():
        break

description, date = filename[:i], filename[i:]
year, month, day = [int[c] for c in date.split('-')]

Upvotes: 0

David Z
David Z

Reputation: 131780

There's a split method in the re module that should work for you.

>>> s = 'abcde2011-09-30.log'
>>> re.split('(\d{4}-\d{2}-\d{2})', s)
('abcde', '2011-09-30', '.log')

If you don't actually want the date as part of the returned list, just omit the parentheses around the regular expression so that it doesn't have a capturing group:

>>> re.split('\d{4}-\d{2}-\d{2}', s)
('abcde', '.log')

Be advised that if the pattern matches more than once, i.e. if there is more than one date in the filename, then this will split on both of them. For example,

>>> s2 = 'abcde2011-09-30fghij2012-09-31.log'
>>> re.split('(\d{4}-\d{2}-\d{2})', s2)
('abcde', '2011-09-30', 'fghij', '2012-09-31', '.log')

If this is a problem, you can use the maxsplit argument to split to only split it once, on the first occurrence of the date:

>>> re.split('(\d{4}-\d{2}-\d{2})', s, 1)
('abcde', '2011-09-30', 'fghij2012-09-31.log')

Upvotes: 2

Toto
Toto

Reputation: 91518

I don't know the exact python regex syntax but something like this should do the job:

/^(\D+?)([\d-]+)(\.log)$/

Upvotes: 0

MattH
MattH

Reputation: 38265

How's this:

>>> import re
>>> a = "abcde2011-09-30.log"
>>> myregexp = re.compile(r'^(.*)(\d{4}-\d{2}-\d{2})(\.\w+)$')
>>> m = myregexp.match(a)
>>> m
<_sre.SRE_Match object at 0xb7f69480>
>>> m.groups()
('abcde', '2011-09-30', '.log')

Upvotes: 1

Related Questions