Anil Bisht
Anil Bisht

Reputation: 65

TCL Regexp for extracting months from a string

I am expecting strings which has month prefixs like JAN, FEB , MAR...

My regex till now ...(J[AU][NL]|FEB|MA[RY]|APR|AUG|SEP|OCT|NOV|DEC)...

Can you guys go any shorter or is there any less ugly alternative??

Thanks

Upvotes: 0

Views: 156

Answers (1)

Peter Lewerin
Peter Lewerin

Reputation: 13252

The less ugly, and far more efficient, alternative is to use the in operator from expr.

expr {$month in {JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC}}

or

if {$month in {JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC}} {
    ...
}

This is an order of magnitude faster, clearer to look at, and you don't get any false positives.


As Donal Fellows notes, if one must use a regexp, it's better to use an explicit one ((JAN|FEB|…|NOV|DEC)) since it's more clear. Now, I've never ventured into the regex engine source code to see how it works (nor would I unless one of my kids was lost in there), but I'm pretty sure that the recognition chains that the engine builds for this expression are at least as efficient as whatever clever abbreviation you or I could come up with.

Another thing: is there any chance you might want to internationalize the application? Abbreviated month names are the same in most countries (in the West, at least), but there are some differences. With Tcl it's very easy to get localized lists of abbreviated month names either by extracting them from clock or by keeping your own lists and using the msgcat package. If you create your regexp like this:

set re ([join [lmap m {0 1 2 3 4 5 6 7 8 9 10 11} {lindex [::msgcat::mc MONTHS_ABBREV] $m}] |])

and later someone wants to change the language of the application, you just re-create it. It's much harder to do this if you want to craft your own regular expressions as in your question above.

Upvotes: 3

Related Questions