tgxiii
tgxiii

Reputation: 1375

Python Regular Expressions to extract date

I have strings that look like these:

{server}_{date:YYYYMMDD}{int:######}
{server}_{date:MON DAY YYYY}{int:######}

...plus more, in different date formats. Also, there can be any number of {} blocks, and they can appear in any order.

I'm trying to get just the "date" part between the curly braces in Python 3.2. So for the first string, I want to get just "{date:YYYYMMDD}" and for the second string I want just "{date:MON DAY YYYY}". The only characters I want inside the "date" block are alpha and whitespace.

My regex pattern is:

\{date:(\w|\s)*\}

I've tested this out on this Regex builder, but it's not matching as expected. This is my output on Python:

>>> import re
>>> re.findall('\{date:(\w|\s)*\}', '{server}_{date:YYYYMMDD}{date:MONDAYYYYY}{int:######}')
['D', 'Y']
>>> re.findall('\{date:(\w|\s)*\}', '{server}_{date:MON DAY YYYY}{int:######}')
['Y']

Can someone please point out what's wrong with my pattern?

Upvotes: 5

Views: 3557

Answers (5)

Roman Bodnarchuk
Roman Bodnarchuk

Reputation: 29717

'(\{date:[\w\s]+\})' gives what you want:

>>> import re
>>> re.findall('(\{date:[\w\s]+\})', '{server}_{date:YYYYMMDD}{date:MONDAYYYYY}{int:######}')
['{date:YYYYMMDD}', '{date:MONDAYYYYY}']
>>> re.findall('(\{date:[\w\s]+\})', '{server}_{date:MON DAY YYYY}{int:######}')
['{date:MON DAY YYYY}']

If you want only data value, use '\{date:([\w\s]+)\}'.

Upvotes: 5

user812786
user812786

Reputation: 4430

Use a capturing group around the entire regex, and a non-capturing group for the (\w|\s) part:

(\{date:(?:\w|\s)*\})

That will result in the output you want:

>>> re.findall('(\{date:(?:\w|\s)*\})', '{server}_{date:MON DAY YYYY}{int:######}')
['{date:MON DAY YYYY}']
>>> re.findall('(\{date:(?:\w|\s)*\})', '{server}_{date:YYYYMMDD}{date:MONDAYYYYY}{int:######}')
['{date:YYYYMMDD}', '{date:MONDAYYYYY}']

Upvotes: 0

matchew
matchew

Reputation: 19645

try this

str = '{server}_{date:MON DAY YYYY}{int:######}'
re.findall('\{date:.*\}(?=\{)',str)

it returns this

['{date:MON DAY YYYY}']

and

str = '{server}_{date:YYYYMMDD}{int:######}'
re.findall('\{date:.*\}(?=\{)',str)

returns the following:

['{date:YYYYMMDD}']

the (?=..\{) does the following:

(?=...) Matches if ... matches next, but doesn’t consume any of the string. This is called a lookahead assertion. For >example, Isaac (?=Asimov) will match 'Isaac ' only if it’s followed by 'Asimov'.(source)

note: this will only function if another block {..} following {date}, I assume this is necessary, and if it is missing your input may be invalid.

Upvotes: 1

Samuel
Samuel

Reputation: 2490

>>> re.findall('\{date:([\w\s]*)\}', '{server}_{date:YYYYMMDD}{date:MONDAYYYYY}{int:######}')
['YYYYMMDD', 'MONDAYYYYY']

Upvotes: 2

eyquem
eyquem

Reputation: 27575

'{server}_({date:.+?}){int:'

enough

.

or , may be better

'(?<={server}_)({date:.+?})(?={int:)'

Upvotes: 0

Related Questions