user1912491
user1912491

Reputation: 121

What does (.+_)* mean when using Henry Spencer regular expression library?

With reference to Henry spencer regex library I want to know the difference between (.+_)* and (.)*.

(.+_)* tries to match the string from back as well. From my understanding . matches any single character, .+ will mean non zero occurrences of that character. _ will mean space or { or } or , etc.

Parentheses imply that any one can be considered for a match and the final * signifies 0 or more occurrences.

I feel (.)* would also achieve the same thing. The + after . might be redundant.

Can someone explain me the subtle difference between the two?

Upvotes: 2

Views: 488

Answers (3)

stema
stema

Reputation: 92976

As I know the _ doesn't have a special meaning, it is just a "_". See regular-expressions.info

Your two regexes are not the same.

  1. (._)* will match one character followed by an underscore (if the underscore has a special meaning in your implementation replace "underscore" by that meaning), this sequence will be matched 0 or more times, e.g. "a_%_._?_"

  2. (.+_)* will match at least one character followed by an underscore, this sequence will be matched 0 or more times, e.g. "abc45_%_.;,:_?#'+*~_"

(.+_)* will match everything that can be matched by (._)* but not the other way round.

Upvotes: 1

Israel Unterman
Israel Unterman

Reputation: 13510

I don't recall that underscore has any special meaning. The special thing about Henry Spencer regex library is that it combines both regex engine techniques - deterministic and non-determinstic.

This has a pro and a con.

The pro is that you regexps will be the fastest possible, simply built, while in other engines you might to use look a head and advanced regexp techniques (like making it fail early if there is no match) to achieve the same speed.

The con is that the entire regexp will be either greedy or non greedy. That is, if you used the * or + withouth a following a ?, then the entire regexp will be greedy, even though you use ? after that. If the first time you use a * or + you follow it by a ?, then the entire regexp will be non greedy.

This makes it a slightly trickier to craft the regexp, but really slightly.

The Henry Speced library is the engine behind tcl's regexp command, which makes this language very efficient for regexps.

Upvotes: 2

alestanis
alestanis

Reputation: 21863

For example, aa aa will be matched by (.+_)* but not by (._)* because the latter expects only one character before the space.

Upvotes: 2

Related Questions