HiFAR
HiFAR

Reputation: 48

python probleme of url web scraping

I want to learn python and for that I started with a small web scraping project. I want to make a competitive scorecard for a travel agency, First of all here is the site link: tn.tunisiebooking.com

As you see, you have to fill out the form then a list of hotels will be displayed I managed to automate the search but I got stuck in the data extraction step, I don't know why it comes back and extract the data from the home page.

If you can help me and explain to me why it is going like this and thank you in advance. Here is the code I used:

import timer
from selenium.webdriver.common.action_chains import ActionChains
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.common.action_chains import ActionChains
from bs4 import BeautifulSoup
import requests



PATH="C:\chromedriver.exe"
driver = webdriver.Chrome(PATH)
driver.get('https://tn.tunisiebooking.com/')
wait = WebDriverWait(driver, 20)


# write script
script = "document.getElementById('ville_des').value ='Sousse';document.getElementById('depart').value ='05/08/2021';document.getElementById('checkin').value ='05/08/2021';document.getElementById('select_ch').value = '1';"
  
    
# generate a alert via javascript
driver.execute_script(script)

btn_rechercher = driver.find_element_by_id('boutonr')
btn_rechercher.click()

print(driver.current_url)
r = requests.get(driver.current_url)

soup = BeautifulSoup(r.text, 'html.parser')

results = soup.find_all('div', attrs={'class':'bloc_titre'})


len(results)


records = []
for result in results:
    nom = result.find('a').text
   
    records.append((nom))
len(records)
import pandas as pd
df = pd.DataFrame(records, columns=['nom'])
df.head()

For more details, this is the home page : HomePage

and this is the page i want to scrape it's open after that i send a form with my destination and date : hotelList

the probleme that the output of my code is showing the liste of the home page not the second : Output

I hope that i made it clear now, Thank you.

Upvotes: 2

Views: 99

Answers (1)

CatChMeIfUCan
CatChMeIfUCan

Reputation: 569

this will get the names of the hotels using selenium only

from time import sleep
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait

PATH = "C:\chromedriver.exe"
driver = webdriver.Chrome(PATH)
driver.get('https://tn.tunisiebooking.com/')
wait = WebDriverWait(driver, 20)

# write script //Your Script Seems fine
script = "document.getElementById('ville_des').value ='Sousse';document.getElementById('depart').value ='05/08/2021';document.getElementById('checkin').value ='05/08/2021';document.getElementById('select_ch').value = '1';"

# generate a alert via javascript
driver.execute_script(script)

btn_rechercher = driver.find_element_by_id('boutonr')
btn_rechercher.click()
sleep(10)
#getting the hotel names by xpath in a loop
    for v in range(1, 20):
        hotel_name = driver.find_element_by_xpath('/html/body/div[6]/div[2]/div[1]/div/div[2]/div/div[4]/div[' + str(v) + ']/div/div[3]/div[1]/div[1]/span/a/h3').get_attribute('innerHTML')
        print(hotel_name)

I don't know what other details you want but this is an example of hotel names based on your input

Upvotes: 1

Related Questions