Reputation: 4531
I have been tasked with changing an .htaccess file. Unfortunately, I know very little about regular expressions, and so most of the file is unreadable for me. In particular, I have these two REs...
1: ^(?!((www|web3|web4|web5|web6|cm|test)\.mydomain\.com)|(?:(?:\d+\.){3}(?:\d+))$).*$
2: ^/([^/][^/])/([^/][^/])/([^/]+)/Job-Posting/$ /Misc/jobposting\.asp\?country=$1&state=$2&city=$3
For the first one, I understand the first half, more or less. it's trying to match against something that ISN'T www.mydomain.com, or web3.mydomain.com, etc., and that it may match that zero or one times. What I'm not clear on is what the second half of that does. My research suggests that ?:
implies some sort of flag, but I didn't see any example that explained what exactly that meant. Please explain what this part means, as well as provide an example that would match it.
For the second one, the comments say this is applicable for a url containing /US/NY/Rochester/Job-Posting/
. From this I can infer that ^/
means one character, but again, I couldnt find that in my research so far. What is the formal definition of ^/
? What is the significance of putting it into square brackets [^/]
?
If I can get a handle on these two RE I should be able to adapt them to my needs. Your help is much appreciated.
Upvotes: 0
Views: 34
Reputation: 1069
?: doesn't match anything in particular, it modifies the behavior of the parenthesis. The ?: means the parenthesis are non-capturing, and thus cannot be referenced in the rule. Non capturing parens are good to use when you don't need to reference the captured text because the system doesn't have to 'remember' the text, which saves resources.
the code in question:
(?:(?:\d+\.){3}(?:\d+))
matches one or more digits followed by a period times three, then one or more digit. This will match IP addresses (ex 127.0.0.1). This will also match 123456.1.1.3456789, so you might want to restrict the number of characters allowed (?:(?:\d{1,3}.){3}(?:\d{1,3})), thought I haven't tested this so take it with a grain of salt.
Info on non capturing groupings.
The second item revolves around using square brackets as a character set. Square brackets match anything noted inside them, with ^ negating the match. So [ad02] will match any of the four characters a,d,0 or 2, while [^ad02] will match any character that is not a,d,0, or 2. So, ^/ means any character that is not /.
One of the tricky things about square brackets is the number of items they will match. [^/] will match one character, but so does [ad02]. It doesn't matter how many characters you have in the set, it still obeys the modifiers on the brackets. So [^/]{3} will match any series of 3 characters that does not contain a forward slash, while [^/]{2} will match a 2 character string with the same restriction.
For more info on character sets see Character Classes or Character Sets
Upvotes: 1