Reputation: 4150
Is there a way to write a regex,in one line,able to catch only specific part of url like this? :
ftp://trial.com:50/papers/history.pdf
getting only ftp, trial.com and 50.
market://find/tools/new
getting only market and find
Upvotes: 1
Views: 597
Reputation: 15480
(\w+):\/\/([\w\.]+)(:(\d+))?.*
Or a less restrictive version (be careful):
(.+?):\/\/([^:\/\?]+)(:(\d+))?.*
And the groups:
$1
is the protocol
$2
is the domain
$4
is the port (optional)
Examples and explanations here.
Upvotes: 0
Reputation: 7161
I think the question is how to extract a part of the matching string, not how to match the whole string. Some tools allow use of parentheses marks (which must be escaped) for this purpose. Consider this example with sed
:
echo ftp://trial.com/hist.pdf | sed 's/^\(.\+\):\/\/\([^\/]\+\)\/\?.*$/\1 \2/'
The sed
command is s/regexp/replacement/ so it matches the regexp and replaces it with replacement. This tags the .\+
part within the parentheses which is printed in the output with \1
. The part between the second parentheses is what comes after the // and before the next /. This is printed with \2
in the replacement. Using \+
means a non-zero sequence (at least one) instead of *
which is zero or more. The parentheses must be escaped to tag the substrings for use in the replacement, otherwise they just mean parentheses characters.
The ^
signifies the beginning of the line. .\+
is at least one character of something. The :\/\/
matches the ://. The [^\/]\+
between the second parentheses is at least one character that is not / followed by \/\?
(an optional /). Lastly, the .*$
is everything until the end of the line.
Upvotes: 0
Reputation:
Try this regex:
\/\/|\/.*|(\w+)
Explaining:
# match without grouping what you do not want
\/\/ # two slashes
| # OR
\/.* # everything after the first alone-slash
| # OR
# now match grouping what you want
(\w+) # each desired word in group 1
Hope it helps
Upvotes: 1