Stefano Maglione
Stefano Maglione

Reputation: 4150

Regex to exctact url substring

Is there a way to write a regex,in one line,able to catch only specific part of url like this? :

ftp://trial.com:50/papers/history.pdf

getting only ftp, trial.com and 50.

market://find/tools/new

getting only market and find

Upvotes: 1

Views: 597

Answers (3)

elias
elias

Reputation: 15480

(\w+):\/\/([\w\.]+)(:(\d+))?.*

Or a less restrictive version (be careful):

(.+?):\/\/([^:\/\?]+)(:(\d+))?.*

And the groups:

$1 is the protocol
$2 is the domain
$4 is the port (optional)

Examples and explanations here.

Upvotes: 0

e0k
e0k

Reputation: 7161

I think the question is how to extract a part of the matching string, not how to match the whole string. Some tools allow use of parentheses marks (which must be escaped) for this purpose. Consider this example with sed:

 echo ftp://trial.com/hist.pdf | sed 's/^\(.\+\):\/\/\([^\/]\+\)\/\?.*$/\1 \2/'

The sed command is s/regexp/replacement/ so it matches the regexp and replaces it with replacement. This tags the .\+ part within the parentheses which is printed in the output with \1. The part between the second parentheses is what comes after the // and before the next /. This is printed with \2 in the replacement. Using \+ means a non-zero sequence (at least one) instead of * which is zero or more. The parentheses must be escaped to tag the substrings for use in the replacement, otherwise they just mean parentheses characters.

The ^ signifies the beginning of the line. .\+ is at least one character of something. The :\/\/ matches the ://. The [^\/]\+ between the second parentheses is at least one character that is not / followed by \/\? (an optional /). Lastly, the .*$ is everything until the end of the line.

Upvotes: 0

user4227915
user4227915

Reputation:

Try this regex:

\/\/|\/.*|(\w+)

Regex live here.

Explaining:

            # match without grouping what you do not want
\/\/        # two slashes
|           # OR
\/.*        # everything after the first alone-slash
|           # OR
            # now match grouping what you want
(\w+)       # each desired word in group 1

Hope it helps

Upvotes: 1

Related Questions