Reputation: 1784
I am trying to figure out how to write an regex
that will match a time. The time can look like this: 11:15-12:15
or 11-12:15
or 11-12
and so on. What i currently have is this:
\\d{2}:?\\d{0,2}-{1}\\d{2}:?\\d{0,2}
which does work until a date comes along. This regex
will capture if a string like this comes 2013-11-05
. I don't want it to find dates. I know i should use Lookbehind
but i can't get it to work.
And i am using Jsoup
Element
getElementsMatchingOwnText
method if that information is of any interest.
The time string is included in a html source. like this: (but with more text above and below)
<td class="text">2013-11-04</td>
Upvotes: 3
Views: 242
Reputation: 10136
Try this. Start with the base regex:
\d{1,2}(:\d\d)?-\d{1,2}(:\d\d)?
That is:
This matches all your core cases:
11-12
1-2
1:15-2
10-3:45
2:15-11:30
etc. Now mix in negative lookbehind and negative lookahead to invalidate matches that appear within undesired contexts. Let's invalidate the match when a digit or dash or colon appears directly to the left or right of the match:
The negative lookbehind: (?<!\d|-|:)
The negative lookahead: (?!\d|-|:)
Slap the neg-lookbehind at the beginning, and the neg-lookahead at the end, you get:
(?<!\d|-|:)(\d{1,2}(:\d\d)?-\d{1,2}(:\d\d)?)(?!\d|-|:)
or as a Java String (by request)
Pattern p = Pattern.compile("(?<!\\d|-|:)(\\d{1,2}(:\\d\\d)?-\\d{1,2}(:\\d\\d)?)(?!\\d|-|:)");
Now while the lookaround has eliminated matches within dates, you're still matching some silly things like 99:99-88:88 because \d matches any digit 0-9. You can mix more restrictive character classes into this regex to address that issue. For example, with a 12-hour clock:
For the hour part, use
(1[0-2]|0?[1-9])
instead of
\d{1,2}
For the minute part use
(0[0-9]|[1-5][0-9])
instead of
\d\d
Mixing the more restrictive character classes into the regex yields this nearly impossible to grok and maintain beast:
(?<!\d|-|:)(((1[0-2]|0?[1-9]))(:((0[0-9]|[1-5][0-9])))?-(1[0-2]|0?[1-9])(:((0[0-9]|[1-5][0-9])))?)(?!\d|-|:)
As Java code:
Pattern p = Pattern.compile("(?<!\\d|-|:)(((1[0-2]|0?[1-9]))(:((0[0-9]|[1-5][0-9])))?-(1[0-2]|0?[1-9])(:((0[0-9]|[1-5][0-9])))?)(?!\\d|-|:)");
Upvotes: 3
Reputation: 48693
Simple method:
((\d{2}(:\d{2})?)-?){2}
A safer; more verbose regular expression:
([0-1]?[0-9]|[2][0-3])(:([0-5][0-9]))?-([0-1]?[0-9]|[2][0-3])(:([0-5][0-9]))?
Example in action:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class App {
private static final String TIME_FORMAT = "%02d:%02d";
private static final String TIME_RANGE = "([0-1]?[0-9]|[2][0-3])(:([0-5][0-9]))?-([0-1]?[0-9]|[2][0-3])(:([0-5][0-9]))?";
public static void main(String[] args) {
String passage = "The time can look like this: 11:15-12:15 or 11-12:15 or 11-12 and so on.";
Pattern pattern = Pattern.compile(TIME_RANGE);
Matcher matcher = pattern.matcher(passage);
int count = 0;
while (matcher.find()) {
String time1 = formattedTime(matcher.group(1), matcher.group(3));
String time2 = formattedTime(matcher.group(4), matcher.group(6));
System.out.printf("Time #%d: %s - %s\n", count, time1, time2);
count++;
}
}
private static String formattedTime(String strHour, String strMinute) {
int intHour = parseInt(strHour);
int intMinute = parseInt(strMinute);
return String.format(TIME_FORMAT, intHour, intMinute);
}
private static int parseInt(String str) {
return str != null ? Integer.parseInt(str) : 0;
}
}
Output:
Time #0: 11:15 - 12:15
Time #1: 11:00 - 12:15
Time #2: 11:00 - 12:00
Upvotes: 1