Reputation: 2250
I have date time string t1
'Sat 02 May 2015 19:54:36 +0530'
I want to extract the remove the first and last word, i.e. Sat and +0530. Here is the behavior of the three regex I wrote:
(1) re.search(r'(\d{2})([^:]+)([:\d{2}]+)',t1) matches '02 May 2015 19:54:36'
(2) re.search(r'(\d{2})([^:]+)([:\d{2}]{2})',t1) matches '02 May 2015 19:5'
(3) re.search(r'(\d{2})(.+)([\:\d{2}])',t1) matches '02 May 2015 19:54:36 +0530'
Can someone explain what's the problem with number 2 and number 3? I thought all of these should yield the same result.
Upvotes: 0
Views: 144
Reputation: 16993
The title of your question relates to regex, but it seems that your question is really about how to remove the first and last word from a date string. In your case, I personally would not use regex. Instead you could simply split the string on spaces, and join the resultant list, leaving out the first and last element:
In [1]: s = 'Sat 02 May 2015 19:54:36 +0530'
In [2]: ' '.join(s.split(' ')[1:-1])
Out[2]: '02 May 2015 19:54:36'
[1:-1]
will give you all elements of a sequence (in this case a list of strings created by split()
) from the second element, up to (but not including) the last element.
Regex is not the "wrong" way to solve your problem, and mine is not "right". However, I find that, where applicable, string methods are often better suited for this kind of job, are easier to read, and are less error-prone. That has been my experience at least.
Upvotes: 1
Reputation: 2738
Can someone explain what's the problem with number 2 and number 3?
The problem in your regex (\d{2})([^:]+)([:\d{2}]{2})
you are using character class in third group i.e ([:\d{2}]{2})
which means it will match either of these characters :
, digits
, {
,2
, }
twice. Hence it matches :5
and stops. Same is with third one.
Your first regex (\d{2})([^:]+)([:\d{2}]+)
because you have used +
(more than one) quantifier which consumes :54:36
since they are in character class [:\d{2}]
.
Removing the character class your second regex will be (\d{2})([^:]+)(:\d{2}){2}
which will work just fine.
Upvotes: 2