Reputation: 10840
I want to parse data which might contain mixed patterns like
1-4pm
1pm-5pm
noon to 11pm
noon to midnight
etc.
I want to extract start and end time. How can I achieve this through regex. I know I can't support all possible input formats, but how can I achieve to support a maximum?
this is my expression
^((?<dayPart>[a-z]+)?)\s*(?<startTime>[0-9]{1,2}[:]?[0-9]{0,2}\s*[am|pm|a.m|p.m]*[.]*)?\s*[-|to|\\|/|=]*\s*((?<endPart>[a-z]+)?|(?<endTime>[0-9]{1,2}[:]?[0-9]{0,2}\s*[am|pm|a.m|p.m]*[.]*))?$
which covers almost all combination.
I just want to know if there is any optimization in this regex.
Here dayPart
will consume all starting non-digit characters to handle if time-span starts with noon, midnight etc or any value which we can ignore like Sunday.
startTime
will try to consume any time in any format if it is there. same is for endPart
and endTime
.
Upvotes: 1
Views: 1514
Reputation: 385910
First, define a pattern that matches a single point in time. Given your examples it might be something like:
(noon|midnight|[0-9]+\s?(am|pm)?)
Next, define the separator. Perhaps:
(to|\-)
Finally, combine two of the first with one of the second. Assuming your language supports variables, something like:
set timePattern {(noon|midnight|[0-9]+\s?(am|pm)?)}
set separator {(to|\-)}
set fullPattern "$timePattern(\s*$separator\s*$timePattern)?"
Once you pass that through the engine you should be able to get at the parts of the expression that matched. You might need to make some groups non-capturing but I'll leave that as an exercise for the reader. You'll then likely have to parse the individual parts to figure out the time. For example, parse "1pm" as a 1 and "pm" and calculate a time based on that.
Once you have it broken down like that it makes it easier to fiddle with the constituent parts and makes the expression a bit more comprehensible. Though, the same thing can be accomplished in some languages that support multiline expressions with comments.
Upvotes: 2
Reputation: 12027
Depending on language, you can 'build-up' a matching pattern. Ruby, for example, will allow you to do something like:
time_spec = /noon|midnight|\d{1,2}/
sep = /-|to/
match = /#{time_spec}\s*#{sep}\s*#{time_spec}/
But, since this seems like something that will be much more complex as it gets extended, why not create some sort of parser (using flex/yacc?) that will maintain much better than a regex? When you start supporting a range of input such as 1pm/1p/13:00/13 regex start creating more problems then solutions.
Upvotes: 1
Reputation: 60398
Without much to go on, it looks like you can split based on "-"
or "to"
.
^(.+) ?(-|to) ?(.+)$
That will capture the start time in the first group and the end time in the third. If you want a more specific syntax, you'll have to specify which variant of regex you intend to use.
Upvotes: 0