Reputation: 5802
how can I remove the protocol from URI? i.e. remove HTTP
Upvotes: 39
Views: 31845
Reputation: 3215
The above answers work in most cases, but IMO it's not a complete solution:
uri.Host + uri.PathAndQuery + uri.Fragment;
drops port if specified (e.g. http://www.example.com:8080/path/ becomes www.example.com/path/ )
uri.GetComponents(UriComponents.AbsoluteUri & ~UriComponents.Scheme, UriFormat.UriEscaped)
preserves ports and seems generally better, but in some cases, (which are most likely to be incorrect, but not impossible), I got some characters escaped that shouldn't.
In both cases we get '/' added at the end, so if your url is potentially sensitive to that difference, or you care how it looks, you need need to check if it was present before and if not TrimEnd it.
On top of that both of those solution throw exception if Uri is considered invalid, so if your url already doesn't have the 'schema' (e.g. www.example.com) the code above fails.
If you want something really generic and working for input over which you might not have control (e.g. user input), I'd probably stick to a simpler solution, e.g:
var endOfSchemaIdx = url.IndexOf("://");
if(endOfSchemaIdx != -1)
return url.Substring(endOfSchemaIdx+3);
return url;
You can also fetch the schema via a library like FLURL (doesn't throw exception on www.example.com) and look up the first occurrence of "url.Schema" + "://", then delete it if exists. I feel safer if the rest of the url is not processed by any library, unless that is your intention.
Upvotes: 0
Reputation: 10941
You can use this the System.Uri
class like this:
System.Uri uri = new Uri("http://stackoverflow.com/search?q=something");
string uriWithoutScheme = uri.Host + uri.PathAndQuery + uri.Fragment;
This will give you stackoverflow.com/search?q=something
Edit: this also works for about:blank :-)
Upvotes: 71
Reputation: 1815
The best (and to me most beautiful) way is to use the Uri
class for parsing the string to an absolute URI and then use the GetComponents
method with the correct UriComponents
enumeration to remove the scheme:
Uri uri;
if (Uri.TryCreate("http://stackoverflow.com/...", UriKind.Absolute, out uri))
{
return uri.GetComponents(UriComponents.AbsoluteUri &~ UriComponents.Scheme, UriFormat.UriEscaped);
}
For further reference: the UriComponents
enumeration is a decorated with the FlagsAttribute
, so bitwise operations (eg. &
and |
) can be used on it. In this case the &~
removes the bits for UriComponents.Scheme
from UriComponents.AbsoluteUri
using the AND operator in combination with the bitwise complement operator.
Upvotes: 20
Reputation: 11903
It's not the most beautiful way, but try something like this:
var uri = new Uri("http://www.example.com");
var scheme = uri.Scheme;
var result = uri.ToString().SubString(scheme.Length + 3);
Upvotes: 1
Reputation: 1063013
In the general sense (not limiting to http/https), an (absolute) uri is always a scheme followed by a colon, followed by scheme-specific data. So the only safe thing to do is cut at the scheme:
string s = "http://stackoverflow.com/questions/4517240/";
int i = s.IndexOf(':');
if (i > 0) s = s.Substring(i + 1);
In the case of http and a few others you may also want to .TrimStart('/')
, but this is not part of the scheme, and is not guaranteed to exist. Trivial example: about:blank
.
Upvotes: 15
Reputation: 971
You could use the RegEx for this. The below sample would meet your need.
using System;
using System.Text.RegularExpressions;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
string txt="http://www.google.com";
string re1="((?:http|https)(?::\\/{2}[\\w]+)(?:[\\/|\\.]?)(?:[^\\s\"]*))"; // HTTP URL 1
Regex r = new Regex(re1,RegexOptions.IgnoreCase|RegexOptions.Singleline);
Match m = r.Match(txt);
if (m.Success)
{
String httpurl1=m.Groups[1].ToString();
Console.Write("("+httpurl1.ToString()+")"+"\n");
}
Console.ReadLine();
}
}
}
Let me know if this helps
Upvotes: 0