Davs Howard
Davs Howard

Reputation: 139

.NET possible URL regex

Trying to create a regex that gets everything after the final "/" in a possible URL, providing the final character isn't a "/".

I have this so far:

(?<url>(http(s)?://)?([\w-]+\.)+[\w-]+[.com]+?[a-zA-Z0-9\.\/\?\:@\-_=#]+(/[/?%&=]*))

My test URLs are

https://linkedin.com/in/username

https://www.facebook.com/username

username

https://plus.google.com/u/0/username/

This passes on all except the final one. The correct result would be username for each test.

Upvotes: 0

Views: 107

Answers (3)

Veverke
Veverke

Reputation: 11358

I think you want can benefit of the Uri object the framework provides. It does not provide the whole solution (segments ending with "/"), but it does most of the job.

    List<string> strings = new List<string>
    {
        "https://linkedin.com/in/username",
        "https://www.facebook.com/username",
        "username",
        "https://plus.google.com/u/0/username/"
    };

    List<Tuple<int, string>> results = new List<Tuple<int, string>>();

    for (int i = 0; i < strings.Count; i++)
    {
        var s = strings.ElementAt(i);
        try
        {
            Uri uri = new Uri(s);
            var lastSegment = uri.Segments.LastOrDefault();
            if (!lastSegment.EndsWith("/") && !string.IsNullOrEmpty(lastSegment))
                results.Add(new Tuple<int, string>(i, lastSegment));
        }
        catch (Exception ex)
        {
            //s is not a valid uri and thus a valid uri object could not be created out of it
            results.Add(new Tuple<int, string>(i, ex.Message));
        }
    }

    foreach (var segment in results)
        Console.WriteLine(segment);

Output: (tuples where the number is the element index in your sample) (the last element is not added because you do not want segments ending with /)

enter image description here

Upvotes: 1

KaSh
KaSh

Reputation: 175

(?<url>(http(s)?://)?([\w-]+\.)+[\w-]+[.com]+?[a-zA-Z0-9\.\/\?\:@\-_=#]+(/*[/?%&=]*)) 

should cover all except "username"? the regex Thomas wrote should cover this?

Upvotes: 0

Thomas Ayoub
Thomas Ayoub

Reputation: 29431

If you really want to go with regex (demo):

\/(\w+)$|(\w+)$|\/(\w+)\/$

If you want to go full C# and a bit of Linq:

List<string> urls = new List<string>
{
    @"https://linkedin.com/in/username",
    @"https://www.facebook.com/username",
    @"username",
    @"https://plus.google.com/u/0/username/",
};

foreach (string url in urls)
{
    Console.Out.WriteLine(url.TrimEnd({'/'}).Split('/').Last());
}

Upvotes: 0

Related Questions