Reputation: 1
Is there someone to help me with the following:
I'm trying to find specific date and time strings in a text (to be used within VBA Word). Currently working with the following RegEx string:
(?:([0-9]{1,2})[ |-])?(?:(jan(?:uari)?|feb(?:ruari)?|m(?:aa)?rt|apr(?:il)?|mei|jun(?:i)?|jul(?:i)?|aug(?:ustus)?|sep(?:tember|t)?|okt(?:ober)?|nov(?:ember)?|dec(?:ember)?))?(?: |-)?(?(3)(?: around | at | ))?(?:([0-9]{1,2}:[0-9]{1,2})?(?: uur| u|u)?)?
Tested output on following text:
Rules:
jun '18
example at: [https://regex101.com/r/6CFgBP/1/]
Expected output (when using in VBA Word): An regex Matches collection object in which each Match.SubMatches contains the individual items d, m, y, hh:mm from the capture groups in the regex search string. So for example 1: the Submatches (or capture groups) contains values: '26' ','sep','2016','09:00'
The RegEx works fine, but some false-positives need to be excluded:
(I was trying with som lookahead and reference \1 and ?(1), but was not able to get it running properly...)
Any advice highly appreciated!
Upvotes: 0
Views: 311
Reputation: 1
Finally I found something that helps me using the month properly :-)
\b(?:([1-3]|[0-3]\d)[ |-](?'month'(?:[1-9]|\d[12])|(?:jan(?:uari)?|feb(?:ruari)?|m(?:aa)?rt|apr(?:il)?|mei|jun(?:i)?|jul(?:i)?|aug(?:ustus)?|sep(?:tember|t)?|okt(?:ober)?|nov(?:ember)?|dec(?:ember)?))?)?(?:(\g'month')[ |-]((?:19|20|\')(?:\d{2})))?\b(?: omstreeks | om | )?(?:(\d{1,2}[:]\d{2}(?: uur|u)?|[0-2]\d{3}(?: uur|u)))?\b
It uses a named constructor/subroutine. Found here: https://www.regular-expressions.info/subroutine.html
Upvotes: 0
Reputation: 30971
As I understood, you require that each date/time part (day, month, year, hour and minute) must be present.
So you should remove ?
after relevant groups (they are not optional).
It is also a good practice to have each group captured as a relevant capturing group.
There is no need to write something like jun(?:i)?
. It is enough
(and easier to read) when you write just juni?
(the ?
refers just
to preceding i
).
Another hint: As the regex language contains \d
char class, use just
it instead of [0-9]
(the regex is shorter and easier to read.
Optional parts (at / around) should be an optional and non-capturing group.
Anything after the minute part is not needed in the regex.
So I propose a regex like below (for readability, I divided it into rows):
(\d{1,2})[ -](jan(?:uari)?|feb(?:ruari)?|m(?:aa)?rt|apr(?:il)?|mei|juni?
|juli?|aug(?:ustus)?|sep(?:tember|t)?|okt(?:ober)?|nov(?:ember)?|dec(?:ember)?)
[ -](\d{4}) (?:around |at )?(\d{1,2}:\d{1,2})
Details:
(\d{1,2})
- Day.[ -]
- A separator after the day (either a space or a minus).(jan(?:uari)?|...dec(?:ember)?)
- Month. [ -]
- A separator after the month.(\d{4})
- year.(?:around |at )?
- Actually, 3 variants of a separator between year
and hour (space / around / at), note the space before (...)?.(\d{1,2}:\d{1,2})
- Hour and minute.It matches variants 1, 2, 3, 5 and 13. All remaining fail to contain each required part, so they are not matched.
If you allow e.g. that the hour/minute part is optional, change the respective fragment into:
( (?:around |at )?(\d{1,2}:\d{1,2}))?
i.e. surround the space/around/at / hour / minute part with (
and )?
,
making this part an optional group. Then, variants 14 and 15 will also
be matched.
One more extension: If you also allow the hour/minute part alone,
add |(\d{1,2}:\d{1,2})
to the regex (all before is the first variant and
the added part is the second variant for just hour/minute.
Then, your variants No 4 and 6 will also be matched.
For a working example see https://regex101.com/r/33t1ps/1
Following your list of rules, I propose the following regex:
(\d{1,2}[ -])?
- Day + separator, optional.(jan(?:uari)?|...|dec(?:ember)?)
- Month.(?:[ -](\d{4}|'\d{2}))?
- Separator + year (either 4 or 2 digits with "'").( (?:around |at )?(\d{1,2}:\d{1,2}))?
- Separator + hour/minute -
optional end of variant 1.|(\d{1,2}:\d{1,2})
- Variant 2 - only hour and minute.It does not match only your variants No 9 and 10.
For full regex, including also "uur" see https://regex101.com/r/33t1ps/3
Upvotes: 0