Johan Lindkvist
Johan Lindkvist

Reputation: 1784

Can't get regex to work

I am trying to figure out how to write an regex that will match a time. The time can look like this: 11:15-12:15 or 11-12:15 or 11-12 and so on. What i currently have is this:

\\d{2}:?\\d{0,2}-{1}\\d{2}:?\\d{0,2}

which does work until a date comes along. This regex will capture if a string like this comes 2013-11-05. I don't want it to find dates. I know i should use Lookbehind but i can't get it to work.

And i am using Jsoup Element getElementsMatchingOwnText method if that information is of any interest.

The time string is included in a html source. like this: (but with more text above and below)

<td class="text">2013-11-04</td>

Upvotes: 3

Views: 242

Answers (2)

Mike Clark
Mike Clark

Reputation: 10136

Try this. Start with the base regex:

\d{1,2}(:\d\d)?-\d{1,2}(:\d\d)?

That is:

  • one-to-two digits, optionally followed by : and two more digits
  • followed by a hyphen
  • followed by one-to-two digits, optionally followed by : and two more digits

This matches all your core cases:

11-12
1-2
1:15-2
10-3:45
2:15-11:30

etc. Now mix in negative lookbehind and negative lookahead to invalidate matches that appear within undesired contexts. Let's invalidate the match when a digit or dash or colon appears directly to the left or right of the match:

The negative lookbehind: (?<!\d|-|:) The negative lookahead: (?!\d|-|:)

Slap the neg-lookbehind at the beginning, and the neg-lookahead at the end, you get:

(?<!\d|-|:)(\d{1,2}(:\d\d)?-\d{1,2}(:\d\d)?)(?!\d|-|:)

or as a Java String (by request)

Pattern p = Pattern.compile("(?<!\\d|-|:)(\\d{1,2}(:\\d\\d)?-\\d{1,2}(:\\d\\d)?)(?!\\d|-|:)");

Now while the lookaround has eliminated matches within dates, you're still matching some silly things like 99:99-88:88 because \d matches any digit 0-9. You can mix more restrictive character classes into this regex to address that issue. For example, with a 12-hour clock:

For the hour part, use

(1[0-2]|0?[1-9])

instead of

\d{1,2}

For the minute part use

(0[0-9]|[1-5][0-9])

instead of

\d\d

Mixing the more restrictive character classes into the regex yields this nearly impossible to grok and maintain beast:

(?<!\d|-|:)(((1[0-2]|0?[1-9]))(:((0[0-9]|[1-5][0-9])))?-(1[0-2]|0?[1-9])(:((0[0-9]|[1-5][0-9])))?)(?!\d|-|:)

As Java code:

Pattern p = Pattern.compile("(?<!\\d|-|:)(((1[0-2]|0?[1-9]))(:((0[0-9]|[1-5][0-9])))?-(1[0-2]|0?[1-9])(:((0[0-9]|[1-5][0-9])))?)(?!\\d|-|:)");

Upvotes: 3

Mr. Polywhirl
Mr. Polywhirl

Reputation: 48693

Simple method:

((\d{2}(:\d{2})?)-?){2}

A safer; more verbose regular expression:

([0-1]?[0-9]|[2][0-3])(:([0-5][0-9]))?-([0-1]?[0-9]|[2][0-3])(:([0-5][0-9]))?

Example in action:

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class App {
    private static final String TIME_FORMAT = "%02d:%02d";
    private static final String TIME_RANGE = "([0-1]?[0-9]|[2][0-3])(:([0-5][0-9]))?-([0-1]?[0-9]|[2][0-3])(:([0-5][0-9]))?";

    public static void main(String[] args) {
        String passage = "The time can look like this: 11:15-12:15 or 11-12:15 or 11-12 and so on.";
        Pattern pattern = Pattern.compile(TIME_RANGE);
        Matcher matcher = pattern.matcher(passage);
        int count = 0;

        while (matcher.find()) {
            String time1 = formattedTime(matcher.group(1), matcher.group(3));
            String time2 = formattedTime(matcher.group(4), matcher.group(6));
            System.out.printf("Time #%d: %s - %s\n", count, time1, time2);
            count++;
        }
    }

    private static String formattedTime(String strHour, String strMinute) {
        int intHour = parseInt(strHour);
        int intMinute = parseInt(strMinute);

        return String.format(TIME_FORMAT, intHour, intMinute);
    }

    private static int parseInt(String str) {
        return str != null ? Integer.parseInt(str) : 0;
    }
}

Output:

Time #0: 11:15 - 12:15
Time #1: 11:00 - 12:15
Time #2: 11:00 - 12:00

Upvotes: 1

Related Questions