Reputation: 91
I am looking to find specific URLs within a large string of text, the URLs are in this format:
https://name.myurl.com/#/shop/rmpa8cmnfg3eerpus3ap9jwekz6k77pnj2pg50ua/login
*The bold part is random.
Currently I am able to extrapolate ALL URLs using the following:
preg_match_all('!https?://\S+!', $string, $matches);
I then need to loop around and pull out all URLs that include a specific string using:
$arr = $matches[0];
foreach ($arr as $haystack) {
if (strlen(strstr($haystack,"shop"))>0) {
echo $haystack;
}
}
I am trying to make the code more efficient and can't seem to nail down a regular expression that can find all URLs matching:
https://name.myurl.com/#/shop/rmpa8cmnfg3eerpus3ap9jwekz6k77pnj2pg50ua/login
If I could it would alleviate the need to do the second string lookup.
Any help would be much appreciated.
Thanks
Upvotes: 1
Views: 456
Reputation: 8332
If all you want to do is to verify that the /shop/
part is part of the URL, use:
https?:\/\/\S*\/shop\/\S*
It's basically your regex, with the addition of requiring /shop/
after the protocol part (http(s)://), and allowing non space characters before and after the shop-part.
Regards
Upvotes: 1
Reputation: 626709
The point is that you need to ask yourself what is so particular in the string you need to match. If the URL contains a subpath of interest, if the subpart is the second, or second from the end, or it consists of both letter and digits, etc.
Once you know what to match, you can start on a regex.
It seems that you need to match URLs with /shop/
subpath. Then, all you need is to include that subpattern to the pattern. Since it is a literal sequence of characters, there is nothing difficult about it:
'~https?://\S+/shop/\S+~'
^^^^^^
See the regex demo
Upvotes: 1