MrPatterns
MrPatterns

Reputation: 4434

Need help understanding this particular regular expression [^.]

[^.]+\.(txt|html)

I am learning regex, and am trying to parse this.

[^.] The ^ means "not", and the dot is a wildcard that means any character, so this means find a match with "not any character"? I still don't understand this. Can anyone explain?

The plus is a Kleene Plus which means "1 or more". So now it's "one or more" "not any character".

I get \., it means a period.

(txt|html) means match with a txt file or html file. I think I understand everything after the plus sign. What I don't understand is why it doesn't look something the DOS equivalent where I can just do this: *.txt or *.(txt|html) where * means everything that ends in the file extension .txt or .html?

Is [^.] the equivalent of * in DOS?

Upvotes: 3

Views: 72

Answers (2)

Amal Murali
Amal Murali

Reputation: 76646

The dot (.) has no special meaning when it's inside a character class, and doesn't require to be escaped.

[^.] means "any character that is not a literal . character". [^.]+ matches one or more occurrences of any character that is not a dot.

From regular-expressions.info:

In most regex flavors, the only special characters or meta-characters inside a character class are the closing bracket (]), the backslash (\), the caret (^), and the hyphen (-). The usual meta-characters are normal characters inside a character class, and do not need to be escaped by a backslash. Your regex will work fine if you escape the regular metacharacters inside a character class, but doing so significantly reduces readability.

Upvotes: 7

Gaurang Tandon
Gaurang Tandon

Reputation: 6753

. is not special inside [] character class. [^.]+ means one or more occurrences (+) of any character which is not a dot.

If you do *.txt it would not be valid regex as * would not get a character to repeat (zero or more times).

Upvotes: 0

Related Questions