Reputation: 749
This community has helped me immensely with my previous regex questions, I do have a question on combining these two regular expressions.
My goal is to have the regex to be: date OR date timestamp
date = (\d{1,2}|[a-zA-Z]{2,8})(?:[/-]{1})(\d{1,2}|[a-zA-Z]{2,8})(?:[/-]{1})(\d*)
timestamp = (\d{1,2})(?:[:]{1})(\d{1,2})(?:[:]{1})(\d{1,2})
I am not able to combine the two of these into one single regex statement. Any help would be great!
Upvotes: 0
Views: 154
Reputation: 4614
First off, I would recommend making your patterns simpler. They contain a lot of redundancy, and what appear to be a few oversights.
Your timestamp pattern: (\d{1,2})(?:[:]{1})(\d{1,2})(?:[:]{1})(\d{1,2})
I'm going to go ahead and assume you do need the capturing groups so you can return the month/day/year later in your program, but for what it's worth there is no reason to group anything in this regex. Therefor the non-capturing groups can be removed.
(\d{1,2})[:]{1}(\d{1,2})[:]{1}(\d{1,2})
There is no reason to put :
inside square brackets, since it is only one character and it has the same meaning both inside and outside of brackets (as opposed to .
for example). Also, {1}
is redundant in all situations.
(\d{1,2}):(\d{1,2}):(\d{1,2})
It's up to personal opinion, but I prefer to write things twice followed by a ?
instead of using {1,2}. Also, I'm guessing it's an oversight that you're allowing only one digit for the year. That would be pretty strange.
(\d\d?):(\d\d?):(\d\d)
Much nicer, right?
Now let's look at your "date" pattern:
(\d{1,2}|[a-zA-Z]{2,8})(?:[/-]{1})(\d{1,2}|[a-zA-Z]{2,8})(?:[/-]{1})(\d*)
Just going to quickly apply all of the changes that I mentioned for the first pattern.
(\d\d?|[a-zA-Z]{2,8})[/-](\d\d?|[a-zA-Z]{2,8})[/-](\d*)
I'm curious about whether or not you actually need to check for a string possibly made up of letters in both the first and second parts. Usually it's one or the other depending on the region, but rarely a mixture of both within the same program. I'm going to go ahead and remove the second part that checks for this but of course go ahead and add it back in if you need it. Anyway, the \d*
at the end looks like it could be a problem. I doubt you want the year to consist of 0, 1, or more than 4 digits.
(\d\d?|[a-zA-Z]{2,8})[/-](\d\d?)[/-](\d{2,4})
(You probably don't want the year to consist of 3 digits either, but this is probably good enough.)
Now that we have these two simplified patterns, the question is how to combine them. The most straightforward and consistent way is to just put the two of them together, separated by a |
.
(\d\d?|[a-zA-Z]{2,8})[/-](\d\d?)[/-](\d{2,4})|(\d\d?):(\d\d?):(\d\d)
However, since they're so similar to each other, it's probably safe to mix them together by just adding the :
delimiter to the second pattern.
(\d\d?|[a-zA-Z]{2,8})[:/-](\d\d?)[:/-](\d{2,4})
Note that this could make some unexpected matches. For example, July:23-1999
. The potential mismatch between delimiters is already somewhat inherent in your "date" pattern, but is now made worse by the addition of the :
. If this is a concern, you can capture the first delimiter and then match for it when you need it again.
(\d\d?|[a-zA-Z]{2,8})([:/-])(\d\d?)\2(\d{2,4})
However, note that this will change the order of your capturing groups, so if your program was relying on \1, \2, and \3, it will now need to use \1, \3, and \4.
In action, with valid and invalid data: https://regex101.com/r/cRAw1Y/1
Upvotes: 2
Reputation: 1310
final = '(' + date + ')|(' + date + ')(' + timestamp ')'
If we also suppose we have a regex for the separator between the date and timestamp, we can just use
final = '(' + date + ')|((' + date + ')(' + separator + ')(' + timestamp + '))'
If this doesn't work for you, please explain why.
Upvotes: 1