Reputation: 27
I want to process source file which contains below line, file contains more than 100 columns and it is tab delimited file.
private static Matcher FILE_NAME_REGEX = Pattern.compile("^\\w+\\d(F|G|H|J|K|M|N|Q|U|V|X|Z)\t169\t3(.*\t){26}\\d{4}/\\d{2}/\\d{2}.*",Pattern.CASE_INSENSITIVE).matcher("");
String line = "CGAS0Z 169 3 38977.5 02:30:00 -350 76000 75700 2255 76000 76000 76000 588 2 76000 06:35:15 2013/03/04 2013/03/05 02:17:40 CGAS 1 JPY CHUKYO Gasoline Futures CHUKYO Gasoline CONT (CGAS3H) JP FUD 169 RES XTKT 2013/03/05 2013/03/05 2013/03/05 10 76350 10 81950 61500 4296057 19178.8258928571 224 CGAS 2013/03/25 116.3987300506 0.5196371877 75700 2255 0.7841672 8.582539 23.298309 12.458333";
if (FILE_NAME_REGEX.reset(line).matches()) {
System.out.println(":)");
} else {
System.out.println(":(");
}
When I am testing this code then it is taking hell lot of time. Can someone please explain whats wrong with this?
Upvotes: 1
Views: 182
Reputation: 138147
I'd try it like this:
Pattern.compile("^\\w+\\d[FGHJKMNQUVXZ]\t169\t3([^\t]*\t){26}\\d{4}/\\d{2}/\\d{2}.*",Pattern.CASE_INSENSITIVE)
([^\t]*\t){26}
should be much quicker than (.*\t){26}
, because it only has one way to match the text. This can also be adapted to support tabs in quoted values, if needed.
Also, you will want to use the (?m)
or Pattern.MULTILINE
flag if you want ^
to work as expected when searching a whole file, and not just a single line.
Upvotes: 4