Reputation: 165
I have a list of Kayak URLs and I'd like to grap the price and link in "View Deal" for the "Best" and "Cheapest" HTML cards, essentially the first two results since I've already sorted the results in the URLs (here's an example of a URL).
I can't get around to grabbing these bits of data using beautifulsoup and I could use some help! Here's what I've tried for pulling price info but I'm getting an empty prices_list
variable. Below is a screenshot of what exactly I'd like to pull info from in the website.
url = https://www.kayak.com/flights/AMS-WMI,nearby/2023-02-15/WMI-SOF,nearby/2023-02-18/SOF-BEG,nearby/2023-02-20/BEG-MIL,nearby/2023-02-23/MIL-AMS,nearby/2023-02-25/?sort=bestflight_a
requests = 0
chrome_options = webdriver.ChromeOptions()
agents = ["Firefox/66.0.3","Chrome/73.0.3683.68","Edge/16.16299"]
print("User agent: " + agents[(requests%len(agents))])
chrome_options.add_argument('--user-agent=' + agents[(requests%len(agents))] + '"')
chrome_options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome('/Users/etc./etc.')
driver.implicitly_wait(10)
driver.get(url)
# getting the prices
sleep(randint(8,10))
xp_prices = '//a[@class="booking-link"]/span[@class="price option-text"]'
prices = driver.find_elements_by_xpath(xp_prices)
prices_list = [price.text.replace('$','') for price in prices if price.text != '']
prices_list = list(map(int, prices_list))
Upvotes: 1
Views: 101
Reputation: 193188
To extract the prices from View Deal for the Best and Cheapest section within the website you have to induce WebDriverWait for visibility_of_all_elements_located() and you can use either of the following locator strategies:
From the Best section:
driver.get("https://www.kayak.com/flights/AMS-WMI,nearby/2023-02-15/WMI-SOF,nearby/2023-02-18/SOF-BEG,nearby/2023-02-20/BEG-MIL,nearby/2023-02-23/MIL-AMS,nearby/2023-02-25/?sort=bestflight_a")
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[text()='Best']//following::div[contains(@class, 'bottom-booking')]//a//div[contains(@class, 'price-text')]"))).text)
Console output:
$807
From the Cheapest section:
driver.get("https://www.kayak.com/flights/AMS-WMI,nearby/2023-02-15/WMI-SOF,nearby/2023-02-18/SOF-BEG,nearby/2023-02-20/BEG-MIL,nearby/2023-02-23/MIL-AMS,nearby/2023-02-25/?sort=bestflight_a")
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[text()='Cheapest']//following::div[contains(@class, 'bottom-booking')]//a//div[contains(@class, 'price-text')]"))).text)
Console output:
$410
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
Upvotes: 1
Reputation: 33361
There are 2 problems here with locator XPath:
a
element class name is not booking-link
, but booking-link
, with trailing space."//div[@class='above-button']//a[contains(@class,'booking-link')]/span[@class='price option-text']"
So, the relevant code line could be:
xp_prices = "//div[@class='above-button']//a[contains(@class,'booking-link')]/span[@class='price option-text']"
Upvotes: 1