Reputation: 4434
[^.]+\.(txt|html)
I am learning regex, and am trying to parse this.
[^.] The ^ means "not", and the dot is a wildcard that means any character, so this means find a match with "not any character"? I still don't understand this. Can anyone explain?
The plus is a Kleene Plus which means "1 or more". So now it's "one or more" "not any character".
I get \.
, it means a period.
(txt|html) means match with a txt file or html file. I think I understand everything after the plus sign. What I don't understand is why it doesn't look something the DOS equivalent where I can just do this: *.txt or *.(txt|html) where * means everything that ends in the file extension .txt or .html?
Is [^.] the equivalent of * in DOS?
Upvotes: 3
Views: 72
Reputation: 76646
The dot (.
) has no special meaning when it's inside a character class, and doesn't require to be escaped.
[^.]
means "any character that is not a literal .
character". [^.]+
matches one or more occurrences of any character that is not a dot.
From regular-expressions.info:
In most regex flavors, the only special characters or meta-characters inside a character class are the closing bracket (
]
), the backslash (\
), the caret (^
), and the hyphen (-
). The usual meta-characters are normal characters inside a character class, and do not need to be escaped by a backslash. Your regex will work fine if you escape the regular metacharacters inside a character class, but doing so significantly reduces readability.
Upvotes: 7
Reputation: 6753
.
is not special inside []
character class. [^.]+
means one or more occurrences (+
) of any character which is not a dot.
If you do *.txt
it would not be valid regex as *
would not get a character to repeat (zero or more times).
Upvotes: 0