1qazxsw2
1qazxsw2

Reputation: 5

Pattern that doesn't match invalid dates like 20999-11-11 and 2009-01-111

File daty.txt contains various dates, some of them are correct, other ones aren't. My current pattern looks like \\d{4}-\\d{2}-\\d{2}. Unfortunately this matches invalid dates like 20999-11-11 and 2009-01-111.

I need a pattern which will match valid dates, and not match the invalid ones.

public class Main {

    public static void main(String[] args) {
        String fname = System.getProperty("user.home") + "/daty.txt";
        List<String> list = Dates(fname,"\\d{4}-\\d{2}-\\d{2}");
        System.out.println(list.toString());
    }

    public static List<String> Dates(String file, String pattern) {
        List<String> result = new ArrayList();
        Pattern p = Pattern.compile(pattern);
        try {
            Scanner scan = new Scanner(new File(file));
            while (scan.hasNextLine()) {
                String line = scan.nextLine();
                Matcher m = p.matcher(line);
                while (m.find()) {
                    String date = m.group();
                    try {
                        DateFormat df = new SimpleDateFormat("yyyy-MM-dd");
                        df.setLenient(false);
                        df.parse(date);
                        result.add(date);
                    } catch (Exception ex) {
                        System.out.println(ex.getMessage());
                    }
                }
            }
        } catch (Exception ex) {
            System.out.println(ex.getMessage());
        }
        return result;
    }
}

Upvotes: 1

Views: 68

Answers (2)

sp00m
sp00m

Reputation: 48817

You could also use word boundaries:

\\b\\d{4}-\\d{2}-\\d{2}\\b

See http://www.regular-expressions.info/wordboundaries.html

Upvotes: 0

CupawnTae
CupawnTae

Reputation: 14580

You can use negative lookbehind and lookahead to say "only match if it's not preceded by a digit" and "only match if it's not followed by a digit":

"(?<!\\d)\\d{4}-\\d{2}-\\d{2}(?!\\d)"

Here (?<!\d) means "not preceded by a digit" and (?!\d) means "not followed by a digit".

More detailed explanation: http://www.regular-expressions.info/lookaround.html

Upvotes: 1

Related Questions