Vitalii
Vitalii

Reputation: 11091

C# Regular expressions. OR condition in lookbehind

I have text with some urls inside. There can be 3 types of urls

  1. without protocol Ex: www.website-link.ch
  2. with http. Ex: http:// www.website-link.ch
  3. with https Ex: https:// www.website-link.ch

I need to get address part from all url (ex: website-link.ch). For it I need a regular expression that will have a positive lookbehind if something starts with 'www.' OR with 'http:// www.' OR with 'https:// www.'

It it possible to put OR condition inside positive lookbehind? For me it did not work and I created only this monster.

string pattern = @"((?<=http://www\.).*\b)|((?<=https://www\.).*\b)|((?<=www\.).*\b)"; 

Is it possible to make a pattern smarter?

Upvotes: 0

Views: 539

Answers (2)

Thang Coder
Thang Coder

Reputation: 141

You could also use the System.UriBuilder class which has built-in functionality to parse a string and separate the parts.

For example:

using System;

public class Program
{
    public static void Main()
    {
        var s = "www.website-link.ch";
        var builder = new UriBuilder(s);
        if (builder.Scheme == Uri.UriSchemeHttps)
        {
            Console.WriteLine("String starts with `https`");
        }

        Console.WriteLine("String does not start with `https`");
    }
}

Upvotes: 1

Dion Williams
Dion Williams

Reputation: 161

You can avoid using lookbehind in this case by simply having the protocol and "www." parts be in non-captured groups.

var regex = new Regex(@"(?:(?:https?://)?www\.)(.*\b)");

Regular expression visualization

Debuggex Demo

Only the text matched by (.*\b) will be captured since all the other groups use the non-capturing (?:) syntax.

The hostname of the website address (without "www.") can then be accessed by checking out the captured groups of the match:

var hostnameMatch = regex.Match("http://www.website-link.ch").Groups[1];
if (hostnameMatch.Success)
    Console.WriteLine("Matched: {0}", hostnameMatch.Value); // Outputs "Matched: website-link.ch"

MSDN has some more information on the properties available for each matched group.

Upvotes: 1

Related Questions