Reputation: 10081
Say I want to extract the hostname and the port number from a string like this:
stackoverflow.com:443
That is pretty easy. I could do something like this:
(?<host>.*):(?<port>\d*)
I am not worried about protocol schemes or valid host names/ip addresses or tcp/udp ports, it is not important to my request.
However, I also need to support one twist that takes this beyond my knowledge of regular expressions - the host name without the port:
stackoverflow.com
I want to use a single regular expression for this, and I want to use named capture groups such that the host group will always exist in a positive match, while the port group exists if and only if we have a colon followed by a number of digits.
I have tried doing a positive lookbehind from my feeble understanding of it:
(?<host>.*)(?<=:)(?<port>\d*)
This comes close, but the colon (:) is included at the end of the host capture. So I tried to change the host to include anything but the colon like this:
(?<host>[^:]*)(?<=:)(?<port>\d*)
That gives me an empty host capture.
Any suggestions on how to accomplish this, i.e. make the colon and the port number optional, but if they are there, include the port number capture and make the colon "vanish"?
Edit: All the four answers I have received work well for me, but pay attention to the comments in some of them. I accepted sln's answer because of the nice layout and explanation of the regexp structure. Thanks to all that replied!
Upvotes: 4
Views: 3757
Reputation: 8954
I'm suggesting to use Uri class instead of regular expressions.
// Use URI class for parsing only
var uri = new Uri("http://" + fullAddress);
// get host
host = uri.DnsSafeHost;
// get port
portNum = (ushort)uri.Port;
The benefits are
See sample of using on .NET Fiddle
Upvotes: 6
Reputation:
This maybe (?<host>[^:]+)(?::(?<port>\d+))?
(?<host> [^:]+ ) # (1), Host, required
(?: # Cluster group start, optional
: # Colon ':'
(?<port> \d+ ) # (2), Port number
)? # Cluster group end
edit - If you were to not use the cluster group, and use a capture group as that cluster group instead, this is how Dot-Net "counts" the groups in its default configuration state -
(?<host> [^:]+ ) #_(2), Host, required
( # (1 start), Unnamed capture group, optional
: # Colon ':'
(?<port> \d+ ) #_(3), Port number
)? # (1 end)
Upvotes: 2
Reputation: 12809
Try this:
(?<host>[^:]+)(:(?<port>\d+))?
This makes the whole colon and port number part an optional group, and catches the port number inside that. Also, I used the plus sign to ensure that hostname and port number contains at least one character.
Upvotes: 1
Reputation: 39405
If your host name doesn't contain :
like ipv64 then try this one:
(?<host>[^:]*):?(?<port>\d*)
Upvotes: 1