Reputation: 48
I want to learn python and for that I started with a small web scraping project. I want to make a competitive scorecard for a travel agency, First of all here is the site link: tn.tunisiebooking.com
As you see, you have to fill out the form then a list of hotels will be displayed I managed to automate the search but I got stuck in the data extraction step, I don't know why it comes back and extract the data from the home page.
If you can help me and explain to me why it is going like this and thank you in advance. Here is the code I used:
import timer
from selenium.webdriver.common.action_chains import ActionChains
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.common.action_chains import ActionChains
from bs4 import BeautifulSoup
import requests
PATH="C:\chromedriver.exe"
driver = webdriver.Chrome(PATH)
driver.get('https://tn.tunisiebooking.com/')
wait = WebDriverWait(driver, 20)
# write script
script = "document.getElementById('ville_des').value ='Sousse';document.getElementById('depart').value ='05/08/2021';document.getElementById('checkin').value ='05/08/2021';document.getElementById('select_ch').value = '1';"
# generate a alert via javascript
driver.execute_script(script)
btn_rechercher = driver.find_element_by_id('boutonr')
btn_rechercher.click()
print(driver.current_url)
r = requests.get(driver.current_url)
soup = BeautifulSoup(r.text, 'html.parser')
results = soup.find_all('div', attrs={'class':'bloc_titre'})
len(results)
records = []
for result in results:
nom = result.find('a').text
records.append((nom))
len(records)
import pandas as pd
df = pd.DataFrame(records, columns=['nom'])
df.head()
For more details, this is the home page : HomePage
and this is the page i want to scrape it's open after that i send a form with my destination and date : hotelList
the probleme that the output of my code is showing the liste of the home page not the second : Output
I hope that i made it clear now, Thank you.
Upvotes: 2
Views: 99
Reputation: 569
this will get the names of the hotels using selenium only
from time import sleep
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
PATH = "C:\chromedriver.exe"
driver = webdriver.Chrome(PATH)
driver.get('https://tn.tunisiebooking.com/')
wait = WebDriverWait(driver, 20)
# write script //Your Script Seems fine
script = "document.getElementById('ville_des').value ='Sousse';document.getElementById('depart').value ='05/08/2021';document.getElementById('checkin').value ='05/08/2021';document.getElementById('select_ch').value = '1';"
# generate a alert via javascript
driver.execute_script(script)
btn_rechercher = driver.find_element_by_id('boutonr')
btn_rechercher.click()
sleep(10)
#getting the hotel names by xpath in a loop
for v in range(1, 20):
hotel_name = driver.find_element_by_xpath('/html/body/div[6]/div[2]/div[1]/div/div[2]/div/div[4]/div[' + str(v) + ']/div/div[3]/div[1]/div[1]/span/a/h3').get_attribute('innerHTML')
print(hotel_name)
I don't know what other details you want but this is an example of hotel names based on your input
Upvotes: 1