Reputation: 43
I'm trying to make a simple scraping loop to pick up titles from dynamic pages. I've made a small script that works the way I expected. Here is the working script:
from selenium import webdriver
driver = webdriver.Chrome('C:/Users/user/Downloads/chromedriver_win32/chromedriver.exe')
url = "https://www.youtube.com/user/LinusTechTips/videos"
driver.get(url)
videos = driver.find_elements_by_xpath('.//*[@id="dismissable"]')
for video in videos:
title = video.find_element_by_xpath('.//*[@id="video-title"]').text
print(title)
It correctly crawls through divs containing titles and other details and scrapes titles. But this script only seems to work on youtube. I've tried it on craigslist, amazon, bookstoscrape, rightmove and hostelworld but it doesn't seem to work on any of those pages. Here is the script for hostelworld:
from selenium import webdriver
driver = webdriver.Chrome('C:/Users/user/Downloads/chromedriver_win32/chromedriver.exe')
url = "https://www.hostelworld.com/s?
q=New%20York,%20New%20York,%20USA&country=USA&city=New%20York&type=city&id=13&from=2020-08-
14&to=2020-08-16&guests=2&page=1"
driver.get(url)
cards = driver.find_elements_by_xpath('.//*[@id="__layout"]/div/div[1]/div[4]/div/div/div[3]')
for card in cards:
title = card.find_element_by_xpath('.//*
[@id="__layout"]/div/div[1]/div[4]/div/div/div[3]/div[2]/div[1]/h2/a').text
print(title)
I'm pretty sure the cards class name is correct from finding it with a search in Chrome dev tools. I think title xpath is correct because it prints correctly if I use it outside the loop. I think the loop is correct too because if I change the cards variable to:
cards = driver.find_elements_by_class_name('property-card')
it prints title once for every card on the page.
But when I add .
to the title xpath it returns an error saying "Message: no such element: Unable to locate element: ...". I'm using .
to prepend the expression so it only searches the parent element getting iterated through, not the whole page. But for some reason adding .
throws the error on all websites I tried except youtube.
I'm trying to stick to xpaths as much as possible because not all websites have good class and id conventions.
Upvotes: 0
Views: 313
Reputation: 33384
To Get the title of all properties.Induce WebDriverWait
() and wait for visibility_of_all_elements_located
() and following css selecor.
url = "https://www.hostelworld.com/s?q=New%20York,%20New%20York,%20USA&country=USA&city=New%20York&type=city&id=13&from=2020-08-14&to=2020-08-16&guests=2&page=1"
driver.get(url)
cards=WebDriverWait(driver,10).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR,"div.property-card h2.title.title-6>a")))
for card in cards:
title = card.text
print(title)
Output:
The Local NYC
HI NYC Hostel
NY Moore Hostel
Broadway Hotel n Hostel
Q4 Hotel
American Dream Hostel
Giorgio Hotel
Freehand New York
West Side YMCA
Hotel 31
Vanderbilt YMCA
Union Hotel Brooklyn
Victorian Inn
Central Park West Hostel
Jazz on the Park Youth Hotel
The Jane
Nesva Hotel
John Hotel
Please note you need to import below libraries.
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
Updated with price.
url = "https://www.hostelworld.com/s?q=New%20York,%20New%20York,%20USA&country=USA&city=New%20York&type=city&id=13&from=2020-08-14&to=2020-08-16&guests=2&page=1"
driver.get(url)
cards=WebDriverWait(driver,10).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR,"div.property-card")))
for card in cards:
try:
title = card.find_element_by_css_selector("h2.title.title-6>a").text
print(title)
price=card.find_element_by_css_selector("p.price.title-5").text
print(price)
except:
continue
Output:
The Local NYC
€45
HI NYC Hostel
€41
NY Moore Hostel
€158
Broadway Hotel n Hostel
€73
Freehand New York
€95
Q4 Hotel
€37
Giorgio Hotel
€158
American Dream Hostel
€128
West Side YMCA
€87
Vanderbilt YMCA
€89
Hotel 31
€74
Union Hotel Brooklyn
€128
Victorian Inn
€88
Central Park West Hostel
€42
The Jane
€115
Jazz on the Park Youth Hotel
€78
Nesva Hotel
€136
John Hotel
€165
Upvotes: 1