Reputation: 770
I have these possible matches
https://www.facebook.com/tr?id=13046212397316299911&ev=pageview&noscript=1
https://www.facebook.com/pages/something
https://www.facebook.com/groups/something/
https://www.facebook.com/something
... random other non-facebook links
The last 3 are valid, but the first one I want to rule out using preg_match_all
Currently
I have this regex which includes all 4 of them, and for the first one, it matches on
https://www.facebook.com/tr
But I want to rule it out completely
This is my current regex
$pattern = "/(?:(?:http|https):\/\/|)(?:www\.|)facebook\.[a-z.]+\/((pages|groups)\/|)[a-zA-Z0-9\-_]{1,}/"
Also when it does match, $matches contains something inside both [0] and [1] and [2] which I don't know why it's happening. I just want a match or no match
Any help please?
$links = [
'https://www.facebook.com/tr?id=13046212397316299911&ev=pageview&noscript=1',
'https://www.facebook.com/pages/something',
'https://www.facebook.com/groups/something/',
'https://www.facebook.com/something',
];
$pattern = "/(?:(?:http|https):\/\/|)(?:www\.|)facebook\.[a-z.]+\/((pages|groups)\/|)[a-zA-Z0-9\-_]{1,}/";
foreach ($links as $link) {
if ($matchesFound = preg_match($pattern, $link, $matches)) {
if ($matchesFound) {
d($matches);
}
}
}
array (3) [
0 => string (41) "https://www.facebook.com/groups/something"
1 => string (7) "groups/"
2 => string (6) "groups"
]
Upvotes: 0
Views: 151
Reputation: 163467
In your pattern you use 2 alternations where you have no value after the last |
. The http or https part can be shortened to https? and that part including the www.
does not have to be in a non capturing group (?:
.
You could move the forward slash into the group to match pages or groups and make the group optional using a question mark. Then match an optional forward slash at the end.
If you use a different delimiter then /
like ~
do don't have to escape the forward slashes.
Your regex might look like:
^https://www\.facebook\.[a-z.]+/(?:pages/|groups/)?[\w-]+/?$
For example:
$pattern = '~^https://www\.facebook\.[a-z.]+/(?:pages/|groups/)?[\w-]+/?$~';
If you want to match more but not the querystring params, you could match 1+ not a question mark or a whitespace char using a negated character class [^?\s]+
.
^https://www\.facebook\.[a-z.]+/(?:pages/|groups/)?[^?\s]+$
Upvotes: 1