Reputation: 4633
I have a link which has status code 200. But when I open it in browser it redirects.
On fetching the same link with Python Requests it simply shows the data from the original link. I tried both Python Requests and urllib but had no success.
How to capture the final URL and its data?
How can a link with status 200 redirect?
>>> url ='http://www.afaqs.com/news/story/52344_The-target-is-to-get-advertisers-to-switch-from-print-to-TV-Ravish-Kumar-Viacom18'
>>> r = requests.get(url)
>>> r.url
'http://www.afaqs.com/news/story/52344_The-target-is-to-get-advertisers-to-switch-from-print-to-TV-Ravish-Kumar-Viacom18'
>>> r.history
[]
>>> r.status_code
200
Upvotes: 3
Views: 3532
Reputation: 61
These kind of url's are present in script tag as they are javascript code. Therefore they are nor fetched by python.
To get the link simply extract them from their respective tags.
Upvotes: 1
Reputation: 7248
This kind of redirect is done by JavaScript. So, you won't directly get the redirected link using requests.get(...)
. The original URL has the following page source:
<html>
<head>
<meta http-equiv="refresh" content="0;URL=http://www.afaqs.com/interviews/index.html?id=572_The-target-is-to-get-advertisers-to-switch-from-print-to-TV-Ravish-Kumar-Viacom18">
<script type="text/javascript" src="http://gc.kis.v2.scr.kaspersky-labs.com/D5838D60-3633-1046-AA3A-D5DDF145A207/main.js" charset="UTF-8"></script>
</head>
<body bgcolor="#FFFFFF"></body>
</html>
Here, you can see the redirected URL. Your job is to scrape that. You can do it using RegEx, or simply some string split operations.
For example:
r = requests.get('http://www.afaqs.com/news/story/52344_The-target-is-to-get-advertisers-to-switch-from-print-to-TV-Ravish-Kumar-Viacom18')
redirected_url = r.text.split('URL=')[1].split('">')[0]
print(redirected_url)
# http://www.afaqs.com/interviews/index.html?id=572_The-target-is-to-get-advertisers-to-switch-from-print-to-TV-Ravish-Kumar-Viacom18
r = requests.get(redirected_url)
# Start scraping from this link...
Or, using a regex:
redirected_url = re.findall(r'URL=(http.*)">', r.text)[0]
Upvotes: 2