Reputation: 11091
I have text with some urls inside. There can be 3 types of urls
I need to get address part from all url (ex: website-link.ch). For it I need a regular expression that will have a positive lookbehind if something starts with 'www.' OR with 'http:// www.' OR with 'https:// www.'
It it possible to put OR condition inside positive lookbehind? For me it did not work and I created only this monster.
string pattern = @"((?<=http://www\.).*\b)|((?<=https://www\.).*\b)|((?<=www\.).*\b)";
Is it possible to make a pattern smarter?
Upvotes: 0
Views: 539
Reputation: 141
You could also use the System.UriBuilder
class which has built-in functionality to parse a string and separate the parts.
For example:
using System;
public class Program
{
public static void Main()
{
var s = "www.website-link.ch";
var builder = new UriBuilder(s);
if (builder.Scheme == Uri.UriSchemeHttps)
{
Console.WriteLine("String starts with `https`");
}
Console.WriteLine("String does not start with `https`");
}
}
Upvotes: 1
Reputation: 161
You can avoid using lookbehind in this case by simply having the protocol and "www." parts be in non-captured groups.
var regex = new Regex(@"(?:(?:https?://)?www\.)(.*\b)");
Only the text matched by (.*\b)
will be captured since all the other groups use the non-capturing (?:)
syntax.
The hostname of the website address (without "www.") can then be accessed by checking out the captured groups of the match:
var hostnameMatch = regex.Match("http://www.website-link.ch").Groups[1];
if (hostnameMatch.Success)
Console.WriteLine("Matched: {0}", hostnameMatch.Value); // Outputs "Matched: website-link.ch"
MSDN has some more information on the properties available for each matched group.
Upvotes: 1