ankeet jena
ankeet jena

Reputation: 13

response status is not 200 on running webdriver to get url and on running beautiful soup to extract content, it throws attribute error

I have been trying to web scrape hotel reviews but on multiple page jumps, the url of the webpage doesn't change. So I am using webdriver from selenium to work this out. It is not showing any error but on checking if the response status is 200, it is showing false. In addition to that, running the line of code which I have mentioned below generates an error. If anyone can fix the issue, effort will be highly appreciated!

!pip install selenium
from selenium import webdriver
import requests
from bs4 import BeautifulSoup
import pandas as pd
# install chromium, its driver, and selenium
!apt-get update
!apt install chromium-chromedriver
!cp /usr/lib/chromium-browser/chromedriver /usr/bin

# set options to be headless, ..
from selenium import webdriver
options = webdriver.ChromeOptions()
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
# open it, go to a website, and get results
wd = webdriver.Chrome('chromedriver',options=options)

code = wd.get('https://www.goibibo.com/hotels/highland-park-hotel-in-trivandrum-1383427384655815037/?hquery={%22ci%22:%2220211209%22,%22co%22:%2220211210%22,%22r%22:%221-2-0%22,%22ibp%22:%22v15%22}&hmd=766931490eb7863d2f38f56c6185a1308de782c89dfeeea59d262b827ca15441bf50472cbfdc1ee84aeed8af756809a2e89cfd6eaea0fa308c1ca839e8c313d016ac0f5948658353cf30f1cd83050fd8e6adb2e55f2a5470cadeb0c28b7becc92ac44d81966b82408effde826d40fbff47525e09b5f145e321fe6d104e12933c066323798e33a911e0cbed7312fc1634f8f92fe502c8602556c9a02f34c047d04ff1400c995799156776c1a04e218d6486493edad5b0f7e51a5ea25f5f1cb4f5ed497ee9368137f6ec73b3b1166ee7c1a885920b90c98542e0270b4fa9004005cfe87a4d1efeaedc8e33a848f73345f09bec19153e8bf625cc7f9216e692a1bcc313e7f13a7fc091328b1fb43598bd236994fdc988ab35e70cf3a5d1856c0b0fa9794b23a1a958a5937ac6d258d121a75b7ce9fc70b9a820af43a8e9a3f279be65b5c6fbfff2ba20bfb0f3e3ee425f0b930bf671c50878a540c6a9003b197622b6ab22ae39e07b5174cb12bebbcd2a132bb8570e01b9e253c1bd83cb292de97a&cc=IN&reviewType=gi&vcid=3877384277955108166&srpFilters={%22type%22:[%22Hotel%22]}')

str(code) == "<Response [200]>"
**Output: ** False 

 soup = BeautifulSoup(code.content,'html.parser') 
 

On running the below line of code, there comes an error:

AttributeError Traceback (most recent call last) in () ----> 1 soup = BeautifulSoup(code.content,'html.parser')

AttributeError: 'NoneType' object has no attribute 'content'

Upvotes: 0

Views: 609

Answers (1)

undetected Selenium
undetected Selenium

Reputation: 193138

get()

get(url: str) loads a web page in the current browser session and doesn't returns anything.

Hence, as per your code, code will be always NULL.


Solution

To validate the Response you can adopt any of the two approaches:

  • Using requests.head():

    import requests
    
    request_response = requests.head(https://www.goibibo.com/hotels/highland-park-hotel-in-trivandrum-1383427384655815037/?hquery={%22ci%22:%2220211209%22,%22co%22:%2220211210%22,%22r%22:%221-2-0%22,%22ibp%22:%22v15%22}&hmd=766931490eb7863d2f38f56c6185a1308de782c89dfeeea59d262b827ca15441bf50472cbfdc1ee84aeed8af756809a2e89cfd6eaea0fa308c1ca839e8c313d016ac0f5948658353cf30f1cd83050fd8e6adb2e55f2a5470cadeb0c28b7becc92ac44d81966b82408effde826d40fbff47525e09b5f145e321fe6d104e12933c066323798e33a911e0cbed7312fc1634f8f92fe502c8602556c9a02f34c047d04ff1400c995799156776c1a04e218d6486493edad5b0f7e51a5ea25f5f1cb4f5ed497ee9368137f6ec73b3b1166ee7c1a885920b90c98542e0270b4fa9004005cfe87a4d1efeaedc8e33a848f73345f09bec19153e8bf625cc7f9216e692a1bcc313e7f13a7fc091328b1fb43598bd236994fdc988ab35e70cf3a5d1856c0b0fa9794b23a1a958a5937ac6d258d121a75b7ce9fc70b9a820af43a8e9a3f279be65b5c6fbfff2ba20bfb0f3e3ee425f0b930bf671c50878a540c6a9003b197622b6ab22ae39e07b5174cb12bebbcd2a132bb8570e01b9e253c1bd83cb292de97a&cc=IN&reviewType=gi&vcid=3877384277955108166&srpFilters={%22type%22:[%22Hotel%22]})
    status_code = request_response.status_code
    if status_code == 200:
        print("URL is valid/up")
    else:
        print("URL is invalid/down")
    
  • Using urlopen():

    import requests
    import urllib
    
    status_code = urllib.request.urlopen(https://www.goibibo.com/hotels/highland-park-hotel-in-trivandrum-1383427384655815037/?hquery={%22ci%22:%2220211209%22,%22co%22:%2220211210%22,%22r%22:%221-2-0%22,%22ibp%22:%22v15%22}&hmd=766931490eb7863d2f38f56c6185a1308de782c89dfeeea59d262b827ca15441bf50472cbfdc1ee84aeed8af756809a2e89cfd6eaea0fa308c1ca839e8c313d016ac0f5948658353cf30f1cd83050fd8e6adb2e55f2a5470cadeb0c28b7becc92ac44d81966b82408effde826d40fbff47525e09b5f145e321fe6d104e12933c066323798e33a911e0cbed7312fc1634f8f92fe502c8602556c9a02f34c047d04ff1400c995799156776c1a04e218d6486493edad5b0f7e51a5ea25f5f1cb4f5ed497ee9368137f6ec73b3b1166ee7c1a885920b90c98542e0270b4fa9004005cfe87a4d1efeaedc8e33a848f73345f09bec19153e8bf625cc7f9216e692a1bcc313e7f13a7fc091328b1fb43598bd236994fdc988ab35e70cf3a5d1856c0b0fa9794b23a1a958a5937ac6d258d121a75b7ce9fc70b9a820af43a8e9a3f279be65b5c6fbfff2ba20bfb0f3e3ee425f0b930bf671c50878a540c6a9003b197622b6ab22ae39e07b5174cb12bebbcd2a132bb8570e01b9e253c1bd83cb292de97a&cc=IN&reviewType=gi&vcid=3877384277955108166&srpFilters={%22type%22:[%22Hotel%22]}).getcode()
    if status_code == 200:
        print("URL is valid/up")
    else:
        print("URL is invalid/down")
    

Upvotes: 0

Related Questions