user85594
user85594

Reputation: 261

Getting the Redirected URL from the Original URL

I have a table in my database which contains the URLs of some websites. I have to open those URLs and verify some links on those pages. The problem is that some URLs get redirected to other URLs. My logic is failing for such URLs.

Is there some way through which I can pass my original URL string and get the redirected URL back?

Example: I am trying with this URL: http://individual.troweprice.com/public/Retail/xStaticFiles/FormsAndLiterature/CollegeSavings/trp529Disclosure.pdf

It gets redirected to this one: http://individual.troweprice.com/staticFiles/Retail/Shared/PDFs/trp529Disclosure.pdf

I tried to use following code:

HttpWebRequest req = (HttpWebRequest)WebRequest.Create(Uris);
req.Proxy = proxy;
req.Method = "HEAD";
req.AllowAutoRedirect = false;

HttpWebResponse myResp = (HttpWebResponse)req.GetResponse();
if (myResp.StatusCode == HttpStatusCode.Redirect)
{
  MessageBox.Show("redirected to:" + myResp.GetResponseHeader("Location"));
}

When I execute the code above it gives me HttpStatusCodeOk. I am surprised why it is not considering it a redirection. If I open the link in Internet Explorer then it will redirect to another URL and open the PDF file.

Can someone help me understand why it is not working properly for the example URL?

By the way, I checked with Hotmail's URL (http://www.hotmail.com) and it correctly returns the redirected URL.

Upvotes: 23

Views: 76236

Answers (12)

Konstantin S.
Konstantin S.

Reputation: 1505

Here's two Async HttpClient versions:

Works in .Net Framework and .Net Core

public static async Task<Uri> GetRedirectedUrlAsync(Uri uri, CancellationToken cancellationToken = default)
{
    using var client = new HttpClient(new HttpClientHandler
    {
        AllowAutoRedirect = false,
    }, true);
    using var response = await client.GetAsync(uri, cancellationToken);

    return new Uri(response.Headers.GetValues("Location").First();
}

Works in .Net Core

public static async Task<Uri> GetRedirectedUrlAsync(Uri uri, CancellationToken cancellationToken = default)
{
    using var client = new HttpClient();
    using var response = await client.GetAsync(uri, cancellationToken);

    return response.RequestMessage.RequestUri;
}

P.S. handler.MaxAutomaticRedirections = 1 can be used if you need to limit the number of attempts.

Upvotes: 5

Prithvi Raj Nandiwal
Prithvi Raj Nandiwal

Reputation: 3294

Use this code to get redirecting URL

public void GrtUrl(string url)
{
    HttpWebRequest webRequest = (HttpWebRequest)WebRequest.Create(url);
    webRequest.AllowAutoRedirect = false;  // IMPORTANT

    webRequest.Timeout = 10000;           // timeout 10s
    webRequest.Method = "HEAD";
    // Get the response ...
    HttpWebResponse webResponse;
    using (webResponse = (HttpWebResponse)webRequest.GetResponse())
    {
        // Now look to see if it's a redirect
        if ((int)webResponse.StatusCode >= 300 && 
            (int)webResponse.StatusCode <= 399)
        {
            string uriString = webResponse.Headers["Location"];
            Console.WriteLine("Redirect to " + uriString ?? "NULL");
            webResponse.Close(); // don't forget to close it - or bad things happen
        }
    }
}

Upvotes: 10

Marcelo Calbucci
Marcelo Calbucci

Reputation: 5945

This function will return the final destination of a link — even if there are multiple redirects. It doesn't account for JavaScript-based redirects or META redirects. Notice that the previous solution didn't deal with Absolute & Relative URLs, since the LOCATION header could return something like "/newhome" you need to combine with the URL that served that response to identify the full URL destination.

    public static string GetFinalRedirect(string url)
    {
        if(string.IsNullOrWhiteSpace(url))
            return url;

        int maxRedirCount = 8;  // prevent infinite loops
        string newUrl = url;
        do
        {
            HttpWebRequest req = null;
            HttpWebResponse resp = null;
            try
            {
                req = (HttpWebRequest) HttpWebRequest.Create(url);
                req.Method = "HEAD";
                req.AllowAutoRedirect = false;
                resp = (HttpWebResponse)req.GetResponse();
                switch (resp.StatusCode)
                {
                    case HttpStatusCode.OK:
                        return newUrl;
                    case HttpStatusCode.Redirect:
                    case HttpStatusCode.MovedPermanently:
                    case HttpStatusCode.RedirectKeepVerb:
                    case HttpStatusCode.RedirectMethod:
                        newUrl = resp.Headers["Location"];
                        if (newUrl == null)
                            return url;

                        if (newUrl.IndexOf("://", System.StringComparison.Ordinal) == -1)
                        {
                            // Doesn't have a URL Schema, meaning it's a relative or absolute URL
                            Uri u = new Uri(new Uri(url), newUrl);
                            newUrl = u.ToString();
                        }
                        break;
                    default:
                        return newUrl;
                }
                url = newUrl;
            }
            catch (WebException)
            {
                // Return the last known good URL
                return newUrl;
            }
            catch (Exception ex)
            {
                return null;
            }
            finally
            {
                if (resp != null)
                    resp.Close();
            }
        } while (maxRedirCount-- > 0);

        return newUrl;
    }

Upvotes: 31

user13239154
user13239154

Reputation:

This code worked for me with Unicode support:

 public static string GetFinalRedirect(string url)
    {
        try
        {
            var request = (HttpWebRequest)HttpWebRequest.Create(url);
            request.Method = "POST";
            request.AllowAutoRedirect = true;
            request.ContentType = "application/x-www-form-urlencoded";
            var response = request.GetResponse();
            return response.ResponseUri.AbsoluteUri.ToString();
        }
        catch(Exception ax)
        {
            return "";
        }
    }

Upvotes: 0

Be05x5
Be05x5

Reputation: 137

After reviewing everyone's suggestions I kind of figured this out for at least my case which basically did 3 loops once to https and second one to actual ending location. This is a recursive function call here:

public static string GrtUrl(string url, int counter)
{
    ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls
    | SecurityProtocolType.Tls11
    | SecurityProtocolType.Tls12
    | SecurityProtocolType.Ssl3;
    string ReturnURL = url;
    HttpWebRequest webRequest = (HttpWebRequest)WebRequest.Create(url);
    webRequest.AllowAutoRedirect = false;  // IMPORTANT

    webRequest.Timeout = 10000;           // timeout 10s
    webRequest.Method = "HEAD";
    // Get the response ...
    HttpWebResponse webResponse;
    using (webResponse = (HttpWebResponse)webRequest.GetResponse())
    {
        // Now look to see if it's a redirect
        if ((int)webResponse.StatusCode >= 300 && (int)webResponse.StatusCode <= 399)
        {
            string uriString = webResponse.Headers["Location"];
            ReturnURL = uriString;
            if (ReturnURL == url)

            {
                webResponse.Close(); // don't forget to close it - or bad things happen!
                return ReturnURL;

            }
            else
            {
                webResponse.Close(); // don't forget to close it - or bad things happen!
                if (counter > 50)
                    return ReturnURL;
                else
                    return GrtUrl(ReturnURL, counter++);
            }

            
        }

    }
    return ReturnURL; 

}

Upvotes: 2

Dmitri
Dmitri

Reputation: 776

A way to deal with javascript redirect is to view the source code of the initial domain's page that would load and then extract a new domain aka the final domain directly from the source code. Since it is a javascript redirect then the new domain aka final domain should be there. Cheers

Code to extract the URL address from page source:

string href = "";
string pageSrc = "get page source using web client download string method and place output here";
Match m = Regex.Match(pageSrc, @"href=\""(.*?)\""", RegexOptions.Singleline);
if (m2.Success){
    href = m.Groups[1].Value; /* will result in http://finalurl.com */
}

Upvotes: -1

Haddad
Haddad

Reputation: 307

string url = ".......";
var request = (HttpWebRequest)WebRequest.Create(url);
var response = (HttpWebResponse)request.GetResponse();

string redirectUrl = response.ResponseUri.ToString();

Upvotes: -1

Armin
Armin

Reputation: 2562

I had the same problem and after tryin a lot I couldn't get what i wanted with HttpWebRequest so i used web browser class to navigate to first url and then i could get the redirected url !

WebBrowser browser = new WebBrowser();
browser.Navigating += new System.Windows.Forms.WebBrowserNavigatingEventHandler(this.browser_Navigating);
string urlToNavigate = "your url";
browser.Navigate(new Uri(urlToNavigate));

then on navigating you can get your redirected url. Be careful that the first time browser_Navigating event handler occurs, e.url is the same url you used to start browsing so you can get redirected url on the second call

private void browser_Navigating(object sender, WebBrowserNavigatingEventArgs e)
{
    Uri uri = e.Url;
}

Upvotes: 0

jayson.centeno
jayson.centeno

Reputation: 835

This code works for me

var request = (HttpWebRequest)HttpWebRequest.Create(url);
request.Method = "POST";
request.AllowAutoRedirect = true;
request.ContentType = "application/x-www-form-urlencoded";
var response = request.GetResponse();

//After sending the request and the request is expected to redirect to some page of your website, The response.ResponseUri.AbsoluteUri contains that url including the query strings //(www.yourwebsite.com/returnulr?r=""... and so on)

Redirect(response.ResponseUri.AbsoluteUri); //then just do your own redirect.

Hope this helps

Upvotes: 0

Code.Town
Code.Town

Reputation: 1226

I made this method using your code and it returns the final redirected URL.

        public string GetFinalRedirectedUrl(string url)
    {
        string result = string.Empty;

        Uri Uris = new Uri(url);

        HttpWebRequest req = (HttpWebRequest)WebRequest.Create(Uris);
        //req3.Proxy = proxy;
        req.Method = "HEAD";
        req.AllowAutoRedirect = false;

        HttpWebResponse myResp = (HttpWebResponse)req.GetResponse();
        if (myResp.StatusCode == HttpStatusCode.Redirect)
        {
            string temp = myResp.GetResponseHeader("Location");
            //Recursive call
            result = GetFinalRedirectedUrl(temp);
        }
        else
        {
            result = url;
        }

        return result;
    }

Note: myResp.ResponseUri does not return the final URL

Upvotes: -2

Can Berk G&#252;der
Can Berk G&#252;der

Reputation: 113370

The URL you mentioned uses a JavaScript redirect, which will only redirect a browser. So there's no easy way to detect the redirect.

For proper (HTTP Status Code and Location:) redirects, you might want to remove

req.AllowAutoRedirect = false;

and get the final URL using

myResp.ResponseUri

as there can be more than one redirect.

UPDATE: More clarification regarding redirects:

There's more than one way to redirect a browser to another URL.

The first way is to use a 3xx HTTP status code, and the Location: header. This is the way the gods intended HTTP redirects to work, and is also known as "the one true way." This method will work on all browsers and crawlers.

And then there are the devil's ways. These include meta refresh, the Refresh: header, and JavaScript. Although these methods work in most browsers, they are definitely not guaranteed to work, and occasionally result in strange behavior (aka. breaking the back button).

Most web crawlers, including the Googlebot, ignore these redirection methods, and so should you. If you absolutely have to detect all redirects, then you would have to parse the HTML for META tags, look for Refresh: headers in the response, and evaluate Javascript. Good luck with the last one.

Upvotes: 22

Alex
Alex

Reputation: 36626

You could check the Request.UrlReferrer.AbsoluteUri to see where i came from. If that doesn't work can you pass the old url as a query string parameter?

Upvotes: 0

Related Questions