Nandesh
Nandesh

Reputation: 4633

Link with status code 200 redirects

I have a link which has status code 200. But when I open it in browser it redirects.

On fetching the same link with Python Requests it simply shows the data from the original link. I tried both Python Requests and urllib but had no success.

  1. How to capture the final URL and its data?

  2. How can a link with status 200 redirect?

>>> url ='http://www.afaqs.com/news/story/52344_The-target-is-to-get-advertisers-to-switch-from-print-to-TV-Ravish-Kumar-Viacom18'
>>> r = requests.get(url)
>>> r.url
'http://www.afaqs.com/news/story/52344_The-target-is-to-get-advertisers-to-switch-from-print-to-TV-Ravish-Kumar-Viacom18'
>>> r.history
[]
>>> r.status_code
200

This is the link

Redirected link

Upvotes: 3

Views: 3532

Answers (2)

Deepshikha Sethi
Deepshikha Sethi

Reputation: 61

These kind of url's are present in script tag as they are javascript code. Therefore they are nor fetched by python.

To get the link simply extract them from their respective tags.

Upvotes: 1

Keyur Potdar
Keyur Potdar

Reputation: 7248

This kind of redirect is done by JavaScript. So, you won't directly get the redirected link using requests.get(...). The original URL has the following page source:

<html>
    <head>
        <meta http-equiv="refresh" content="0;URL=http://www.afaqs.com/interviews/index.html?id=572_The-target-is-to-get-advertisers-to-switch-from-print-to-TV-Ravish-Kumar-Viacom18">
        <script type="text/javascript" src="http://gc.kis.v2.scr.kaspersky-labs.com/D5838D60-3633-1046-AA3A-D5DDF145A207/main.js" charset="UTF-8"></script>
    </head>
    <body bgcolor="#FFFFFF"></body>
</html>

Here, you can see the redirected URL. Your job is to scrape that. You can do it using RegEx, or simply some string split operations.

For example:

r = requests.get('http://www.afaqs.com/news/story/52344_The-target-is-to-get-advertisers-to-switch-from-print-to-TV-Ravish-Kumar-Viacom18')
redirected_url = r.text.split('URL=')[1].split('">')[0]
print(redirected_url)
# http://www.afaqs.com/interviews/index.html?id=572_The-target-is-to-get-advertisers-to-switch-from-print-to-TV-Ravish-Kumar-Viacom18

r = requests.get(redirected_url)
# Start scraping from this link...

Or, using a regex:

redirected_url = re.findall(r'URL=(http.*)">', r.text)[0]

Upvotes: 2

Related Questions