Reputation: 4810
What would be the best regular expression for this scenario?
Given this URL:
http://php.net/manual/en/function.preg-match.php
How should I go about selecting everything between (but not including) http://php.net
and .php
:
/manual/en/function.preg-match
This is for an Nginx configuration file.
Upvotes: 11
Views: 1260051
Reputation: 32532
There's no need to use a regular expression to dissect a URL. PHP has built-in functions for this, pathinfo() and parse_url().
Upvotes: 3
Reputation: 26930
Like this:
if (preg_match('/(?<=net).*(?=\.php)/', $subject, $regs)) {
$result = $regs[0];
}
Explanation:
"
(?<= # Assert that the regex below can be matched, with the match ending at this position (positive lookbehind)
net # Match the characters “net” literally
)
. # Match any single character that is not a line break character
* # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
(?= # Assert that the regex below can be matched, starting at this position (positive lookahead)
\. # Match the character “.” literally
php # Match the characters “php” literally
)
"
Upvotes: 8
Reputation: 173562
Just for the fun of it, here are two ways that have not been explored:
substr($url, strpos($s, '/', 8), -4)
Or:
substr($s, strpos($s, '/', 8), -strlen($s) + strrpos($s, '.'))
Based on the idea that HTTP schemes http://
and https://
are at most 8 characters, so typically it suffices to find the first slash from the 9th position onwards. If the extension is always .php
the first code will work, otherwise the other one is required.
For a pure regular expression solution you can break the string down like this:
~^(?:[^:/?#]+:)?(?://[^/?#]*)?([^?#]*)~
^
The path portion would be inside the first memory group (i.e. index 1), indicated by the ^
in the line underneath the expression. Removing the extension can be done using pathinfo()
:
$parts = pathinfo($matches[1]);
echo $parts['dirname'] . '/' . $parts['filename'];
You can also tweak the expression to this:
([^?#]*?)(?:\.[^?#]*)?(?:\?|$)
This expression is not very optimal though, because it has some back tracking in it. In the end I would go for something less custom:
$parts = pathinfo(parse_url($url, PHP_URL_PATH));
echo $parts['dirname'] . '/' . $parts['filename'];
Upvotes: 2
Reputation: 3682
http:[\/]{2}.+?[.][^\/]+(.+)[.].+
let's see, what it done:
http:[\/]{2}.+?[.][^\/]
- non-capture group for http://php.net
(.+)[.]
- capture part until last dot occur: /manual/en/function.preg-match
[.].+
- matching extension of file like this: .php
Upvotes: -1
Reputation: 8731
re> |(?<=\w)/.+(?=\.\w+$)| Compile time 0.0011 milliseconds Memory allocation (code space): 32 Study time 0.0002 milliseconds Capturing subpattern count = 0 No options First char = '/' No need char Max lookbehind = 1 Subject length lower bound = 2 No set of starting bytes data> http://php.net/manual/en/function.preg-match.php Execute time 0.0007 milliseconds 0: /manual/en/function.preg-match
re> |//[^/]*(.*)\.\w+$| Compile time 0.0010 milliseconds Memory allocation (code space): 28 Study time 0.0002 milliseconds Capturing subpattern count = 1 No options First char = '/' Need char = '.' Subject length lower bound = 4 No set of starting bytes data> http://php.net/manual/en/function.preg-match.php Execute time 0.0005 milliseconds 0: //php.net/manual/en/function.preg-match.php 1: /manual/en/function.preg-match
re> |/[^/]+(.*)\.| Compile time 0.0008 milliseconds Memory allocation (code space): 23 Study time 0.0002 milliseconds Capturing subpattern count = 1 No options First char = '/' Need char = '.' Subject length lower bound = 3 No set of starting bytes data> http://php.net/manual/en/function.preg-match.php Execute time 0.0005 milliseconds 0: /php.net/manual/en/function.preg-match. 1: /manual/en/function.preg-match
re> |/[^/]+\K.*(?=\.)| Compile time 0.0009 milliseconds Memory allocation (code space): 22 Study time 0.0002 milliseconds Capturing subpattern count = 0 No options First char = '/' No need char Subject length lower bound = 2 No set of starting bytes data> http://php.net/manual/en/function.preg-match.php Execute time 0.0005 milliseconds 0: /manual/en/function.preg-match
re> |\w+\K/.*(?=\.)| Compile time 0.0009 milliseconds Memory allocation (code space): 22 Study time 0.0003 milliseconds Capturing subpattern count = 0 No options No first char Need char = '/' Subject length lower bound = 2 Starting byte set: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z data> http://php.net/manual/en/function.preg-match.php Execute time 0.0011 milliseconds 0: /manual/en/function.preg-match
Upvotes: 0
Reputation: 313
Simple:
$url = "http://php.net/manual/en/function.preg-match.php";
preg_match("/http:\/\/php\.net(.+)\.php/", $url, $matches);
echo $matches[1];
$matches[0]
is your full URL, $matches[1]
is the part you want.
See yourself: http://codepad.viper-7.com/hHmwI2
Upvotes: 0
Reputation: 2621
Here's a regex solution better than what most have provided so far, if you ask me: http://regex101.com/r/nQ8rH5
/http:\/\/[^\/]+\K.*(?=\.[^.]+$)/i
Upvotes: 0
Reputation: 6029
Regular expression for matching everything after "net" and before ".php":
$pattern = "net([a-zA-Z0-9_]*)\.php";
In the above regular expression, you can find the matching group of characters enclosed by "()" to be what you are looking for.
Hope it's useful.
Upvotes: -1
Reputation:
A regular expression might not be the most effective tool for this job.
Try using parse_url()
, combined with pathinfo()
:
$url = 'http://php.net/manual/en/function.preg-match.php';
$path = parse_url($url, PHP_URL_PATH);
$pathinfo = pathinfo($path);
echo $pathinfo['dirname'], '/', $pathinfo['filename'];
The above code outputs:
/manual/en/function.preg-match
Upvotes: 20
Reputation: 15159
This general URL match allows you to select parts of a URL:
if (preg_match('/\\b(?P<protocol>https?|ftp):\/\/(?P<domain>[-A-Z0-9.]+)(?P<file>\/[-A-Z0-9+&@#\/%=~_|!:,.;]*)?(?P<parameters>\\?[-A-Z0-9+&@#\/%=~_|!:,.;]*)?/i', $subject, $regs)) {
$result = $regs['file'];
//or you can append the $regs['parameters'] too
} else {
$result = "";
}
Upvotes: 0
Reputation: 8550
Try this:
preg_match("/net(.*)\.php$/","http://php.net/manual/en/function.preg-match.php", $matches);
echo $matches[1];
// prints /manual/en/function.preg-match
Upvotes: 3