Sushant Sukhi
Sushant Sukhi

Reputation: 27

Regex issue - Matcher taking so much to time to match.

I want to process source file which contains below line, file contains more than 100 columns and it is tab delimited file.

private static Matcher FILE_NAME_REGEX = Pattern.compile("^\\w+\\d(F|G|H|J|K|M|N|Q|U|V|X|Z)\t169\t3(.*\t){26}\\d{4}/\\d{2}/\\d{2}.*",Pattern.CASE_INSENSITIVE).matcher("");

    String line = "CGAS0Z   169 3   38977.5 02:30:00    -350    76000   75700   2255        76000   76000   76000       588             2                               76000   06:35:15    2013/03/04                  2013/03/05  02:17:40    CGAS    1   JPY CHUKYO Gasoline                 Futures CHUKYO Gasoline CONT (CGAS3H)           JP      FUD         169                         RES     XTKT    2013/03/05  2013/03/05  2013/03/05          10  76350                                       10                                  81950   61500       4296057 19178.8258928571    224 CGAS        2013/03/25  116.3987300506  0.5196371877        75700   2255                    0.7841672   8.582539    23.298309           12.458333";

    if (FILE_NAME_REGEX.reset(line).matches()) {
        System.out.println(":)");
    } else {
        System.out.println(":(");
    }

When I am testing this code then it is taking hell lot of time. Can someone please explain whats wrong with this?

Upvotes: 1

Views: 182

Answers (1)

Kobi
Kobi

Reputation: 138147

I'd try it like this:

Pattern.compile("^\\w+\\d[FGHJKMNQUVXZ]\t169\t3([^\t]*\t){26}\\d{4}/\\d{2}/\\d{2}.*",Pattern.CASE_INSENSITIVE)

([^\t]*\t){26} should be much quicker than (.*\t){26}, because it only has one way to match the text. This can also be adapted to support tabs in quoted values, if needed.

Also, you will want to use the (?m) or Pattern.MULTILINE flag if you want ^ to work as expected when searching a whole file, and not just a single line.

Upvotes: 4

Related Questions