vesperknight
vesperknight

Reputation: 770

php regex to match facebook pages, groups and usernames, but ignore links with query params

I have these possible matches

https://www.facebook.com/tr?id=13046212397316299911&ev=pageview&noscript=1
https://www.facebook.com/pages/something
https://www.facebook.com/groups/something/
https://www.facebook.com/something
... random other non-facebook links

The last 3 are valid, but the first one I want to rule out using preg_match_all

Currently

I have this regex which includes all 4 of them, and for the first one, it matches on

https://www.facebook.com/tr

But I want to rule it out completely

This is my current regex

 $pattern = "/(?:(?:http|https):\/\/|)(?:www\.|)facebook\.[a-z.]+\/((pages|groups)\/|)[a-zA-Z0-9\-_]{1,}/"

Also when it does match, $matches contains something inside both [0] and [1] and [2] which I don't know why it's happening. I just want a match or no match

Any help please?

    $links = [
        'https://www.facebook.com/tr?id=13046212397316299911&ev=pageview&noscript=1',
        'https://www.facebook.com/pages/something',
        'https://www.facebook.com/groups/something/',
        'https://www.facebook.com/something',
    ];

    $pattern = "/(?:(?:http|https):\/\/|)(?:www\.|)facebook\.[a-z.]+\/((pages|groups)\/|)[a-zA-Z0-9\-_]{1,}/";

    foreach ($links as $link) {
        if ($matchesFound = preg_match($pattern, $link, $matches)) {
            if ($matchesFound) {
                d($matches);
            }
        }
    }

array (3) [
  0 => string (41) "https://www.facebook.com/groups/something"
  1 => string (7) "groups/"
  2 => string (6) "groups"
]

Upvotes: 0

Views: 151

Answers (1)

The fourth bird
The fourth bird

Reputation: 163467

In your pattern you use 2 alternations where you have no value after the last |. The http or https part can be shortened to https? and that part including the www. does not have to be in a non capturing group (?:.

You could move the forward slash into the group to match pages or groups and make the group optional using a question mark. Then match an optional forward slash at the end.

If you use a different delimiter then / like ~ do don't have to escape the forward slashes.

Your regex might look like:

^https://www\.facebook\.[a-z.]+/(?:pages/|groups/)?[\w-]+/?$

Regex demo | Php demo

For example:

$pattern = '~^https://www\.facebook\.[a-z.]+/(?:pages/|groups/)?[\w-]+/?$~';

If you want to match more but not the querystring params, you could match 1+ not a question mark or a whitespace char using a negated character class [^?\s]+.

^https://www\.facebook\.[a-z.]+/(?:pages/|groups/)?[^?\s]+$

Regex demo | Php demo

Upvotes: 1

Related Questions