How I get html content from redirect website with protection using BeautifulSoup?

Question

I have problems trying to get the html content from a web page.

In this website: https://tmofans.com/library/manga/5763/nisekoi when you click on play icon for examen in "Capitulo 230.00" its open the next link: https://tmofans.com/goto/347231 redirects you to this website: https://tmofans.com/viewer/5c187dcea0240/paginated

The problem is when you open directly on this link: https://tmofans.com/goto/347231 the page gives a message of 403 Forbidden. the only way to be redirected to final page is by clicking on the play button from first page.

I want to get the final url content using only the tmofans.com/goto link

I trie to get html content using requests and BeautifulSoup

import requests
from BeautifulSoup import BeautifulSoup

response = requests.get("https://tmofans.com/goto/347231") 
page = str(BeautifulSoup(response.content))

print page

When i do this with https://tmofans.com/goto/347231 i only get the content of 403 Forbidden page.

Bitto · Accepted Answer

This website checks if you have a referer from their site, it gives you a 403 response otherwise. You can easily bypass this by setting a referer.

import requests
ref='https://tmofans.com'
headers = { 'Referer': ref }
r = requests.get('https://tmofans.com/goto/347231',headers=headers)
print(r.url)
print(r.status_code)

Output

https://tmofans.com/viewer/5c187dcea0240/paginated
200

How I get html content from redirect website with protection using BeautifulSoup?

Answers (2)

Related Questions