Unable to read html page from beautiful soup

Question

The below code got stuck after printing hi in output. Can you please check what is wrong with this? And if the site is secure and I need some special authentication?

from bs4 import BeautifulSoup
import requests

print('hi')
rooturl='http://www.hoovers.com/company-information/company-search.html'
r=requests.get(rooturl);
print('hi1')
soup=BeautifulSoup(r.content,"html.parser");
print('hi2')
print(soup)

KC. · Accepted Answer

Unable to read html page from beautiful soup

Why you got this problem is website consider that you are robots, they won't send anything to you. And they even hang up the connection let you wait forever.

You just imitate browser's request, then server will consider you are not an robot.

Add headers is the simplest way to deal with this problem. But something you should not pass User-Agent only(like this time). Remember copy your browser's request and remove the useless element(s) through testing. If you are lazy use browser's headers straightly, but you must not copy all of them when you want to upload files

from bs4 import BeautifulSoup
import requests

rooturl='http://www.hoovers.com/company-information/company-search.html'
with requests.Session() as se:
    se.headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36",
        "Accept-Encoding": "gzip, deflate",
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
        "Accept-Language": "en"
    }
    resp = se.get(rooturl)
print(resp.content)
soup = BeautifulSoup(resp.content,"html.parser")

Unable to read html page from beautiful soup

Answers (2)

Related Questions