Reputation: 5
File daty.txt
contains various dates, some of them are correct, other ones aren't. My current pattern looks like \\d{4}-\\d{2}-\\d{2}
. Unfortunately this matches invalid dates like 20999-11-11
and 2009-01-111
.
I need a pattern which will match valid dates, and not match the invalid ones.
public class Main {
public static void main(String[] args) {
String fname = System.getProperty("user.home") + "/daty.txt";
List<String> list = Dates(fname,"\\d{4}-\\d{2}-\\d{2}");
System.out.println(list.toString());
}
public static List<String> Dates(String file, String pattern) {
List<String> result = new ArrayList();
Pattern p = Pattern.compile(pattern);
try {
Scanner scan = new Scanner(new File(file));
while (scan.hasNextLine()) {
String line = scan.nextLine();
Matcher m = p.matcher(line);
while (m.find()) {
String date = m.group();
try {
DateFormat df = new SimpleDateFormat("yyyy-MM-dd");
df.setLenient(false);
df.parse(date);
result.add(date);
} catch (Exception ex) {
System.out.println(ex.getMessage());
}
}
}
} catch (Exception ex) {
System.out.println(ex.getMessage());
}
return result;
}
}
Upvotes: 1
Views: 68
Reputation: 48817
You could also use word boundaries:
\\b\\d{4}-\\d{2}-\\d{2}\\b
See http://www.regular-expressions.info/wordboundaries.html
Upvotes: 0
Reputation: 14580
You can use negative lookbehind and lookahead to say "only match if it's not preceded by a digit" and "only match if it's not followed by a digit":
"(?<!\\d)\\d{4}-\\d{2}-\\d{2}(?!\\d)"
Here (?<!\d)
means "not preceded by a digit" and (?!\d)
means "not followed by a digit".
More detailed explanation: http://www.regular-expressions.info/lookaround.html
Upvotes: 1