Rune Jacobsen
Rune Jacobsen

Reputation: 10081

Extract host/port combo with .net regex - port part optional

Say I want to extract the hostname and the port number from a string like this:

stackoverflow.com:443

That is pretty easy. I could do something like this:

(?<host>.*):(?<port>\d*)

I am not worried about protocol schemes or valid host names/ip addresses or tcp/udp ports, it is not important to my request.

However, I also need to support one twist that takes this beyond my knowledge of regular expressions - the host name without the port:

stackoverflow.com

I want to use a single regular expression for this, and I want to use named capture groups such that the host group will always exist in a positive match, while the port group exists if and only if we have a colon followed by a number of digits.

I have tried doing a positive lookbehind from my feeble understanding of it:

(?<host>.*)(?<=:)(?<port>\d*)

This comes close, but the colon (:) is included at the end of the host capture. So I tried to change the host to include anything but the colon like this:

(?<host>[^:]*)(?<=:)(?<port>\d*)

That gives me an empty host capture.

Any suggestions on how to accomplish this, i.e. make the colon and the port number optional, but if they are there, include the port number capture and make the colon "vanish"?

Edit: All the four answers I have received work well for me, but pay attention to the comments in some of them. I accepted sln's answer because of the nice layout and explanation of the regexp structure. Thanks to all that replied!

Upvotes: 4

Views: 3757

Answers (5)

Alex Klaus
Alex Klaus

Reputation: 8954

I'm suggesting to use Uri class instead of regular expressions.

// Use URI class for parsing only
var uri = new Uri("http://" + fullAddress);
// get host
host = uri.DnsSafeHost;
// get port
portNum = (ushort)uri.Port;

The benefits are

  • It supports:
    • IPv4 and IPv6
    • Internationalized domain name (IDN)
  • Can be extended to take schema into account in the future
  • Short and standardised code, so less mistakes

See sample of using on .NET Fiddle

Upvotes: 6

user557597
user557597

Reputation:

This maybe (?<host>[^:]+)(?::(?<port>\d+))?

 (?<host> [^:]+ )               # (1), Host, required
 (?:                            # Cluster group start, optional
      :                              # Colon ':'
      (?<port> \d+ )                 # (2), Port number
 )?                             # Cluster group end

edit - If you were to not use the cluster group, and use a capture group as that cluster group instead, this is how Dot-Net "counts" the groups in its default configuration state -

 (?<host> [^:]+ )         #_(2), Host, required                           
 (                        # (1 start), Unnamed capture group, optional
      :                        # Colon ':'
      (?<port> \d+ )           #_(3), Port number                           
 )?                       # (1 end)

Upvotes: 2

brz
brz

Reputation: 6016

You can use this :

(?<host>[^:]+)(:(?<port>\\d+))?

Upvotes: 1

Zolt&#225;n Tam&#225;si
Zolt&#225;n Tam&#225;si

Reputation: 12809

Try this:

(?<host>[^:]+)(:(?<port>\d+))?

This makes the whole colon and port number part an optional group, and catches the port number inside that. Also, I used the plus sign to ensure that hostname and port number contains at least one character.

Upvotes: 1

Sabuj Hassan
Sabuj Hassan

Reputation: 39405

If your host name doesn't contain : like ipv64 then try this one:

(?<host>[^:]*):?(?<port>\d*)

Upvotes: 1

Related Questions