Sumit
Sumit

Reputation: 2250

Date time matching using regex

I have date time string t1

'Sat 02 May 2015 19:54:36 +0530'

I want to extract the remove the first and last word, i.e. Sat and +0530. Here is the behavior of the three regex I wrote:

(1) re.search(r'(\d{2})([^:]+)([:\d{2}]+)',t1) matches '02 May 2015 19:54:36'
(2) re.search(r'(\d{2})([^:]+)([:\d{2}]{2})',t1) matches '02 May 2015 19:5'
(3) re.search(r'(\d{2})(.+)([\:\d{2}])',t1) matches '02 May 2015 19:54:36 +0530'

Can someone explain what's the problem with number 2 and number 3? I thought all of these should yield the same result.

Upvotes: 0

Views: 144

Answers (2)

elethan
elethan

Reputation: 16993

The title of your question relates to regex, but it seems that your question is really about how to remove the first and last word from a date string. In your case, I personally would not use regex. Instead you could simply split the string on spaces, and join the resultant list, leaving out the first and last element:

In [1]: s = 'Sat 02 May 2015 19:54:36 +0530'

In [2]: ' '.join(s.split(' ')[1:-1])
Out[2]: '02 May 2015 19:54:36'

[1:-1] will give you all elements of a sequence (in this case a list of strings created by split()) from the second element, up to (but not including) the last element.

Regex is not the "wrong" way to solve your problem, and mine is not "right". However, I find that, where applicable, string methods are often better suited for this kind of job, are easier to read, and are less error-prone. That has been my experience at least.

Upvotes: 1

Rahul
Rahul

Reputation: 2738

Can someone explain what's the problem with number 2 and number 3?

The problem in your regex (\d{2})([^:]+)([:\d{2}]{2}) you are using character class in third group i.e ([:\d{2}]{2}) which means it will match either of these characters :, digits, { ,2, } twice. Hence it matches :5 and stops. Same is with third one.

Your first regex (\d{2})([^:]+)([:\d{2}]+) because you have used + (more than one) quantifier which consumes :54:36 since they are in character class [:\d{2}].

Removing the character class your second regex will be (\d{2})([^:]+)(:\d{2}){2} which will work just fine.

Regex101 Demo

Upvotes: 2

Related Questions