Reputation: 43
Hi guys this is my first question. I am trying to extract data from a website. But the problem is, it only appears when I hover my mouse over it. the website to the data is http://insideairbnb.com/melbourne/ . I want to extract the occupancy rate for each listing from the panel that pops up when I hover my mouse pointer over the points on the map. I am trying to use @frianH code from this stackoverflow post Scrape website with dynamic mouseover event. I am a newbie in data extraction using selenium. I have knowledge about bs4 package. I havent been successful in finding the right xpath to complete the task. Thank you in advance. my code so far is
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver import ActionChains
from selenium import webdriver
chrome_options = webdriver.ChromeOptions()
browser = webdriver.Chrome(options=chrome_options, executable_path='C:\\Users\\Kunal\\chromedriver.exe')
browser.get('http://insideairbnb.com/melbourne/')
browser.maximize_window()
#wait all circle
elements = WebDriverWait(browser, 20).until(EC.visibility_of_all_elements_located((By.XPATH, '//*[@id="map"]/div[1]/div[2]/div[2]/svg')))
table = browser.find_element_by_class_name('leaflet-zoom-animated')
#move perform -> to table
browser.execute_script("arguments[0].scrollIntoView(true);", table)
data = []
for circle in elements:
#move perform -> to each circle
ActionChains(browser).move_to_element(circle).perform()
# wait change mouseover effect
mouseover = WebDriverWait(browser, 30).until(EC.visibility_of_element_located((By.XPATH, '//*[@id="neighbourhoodBoundaries"]')))
data.append(mouseover.text)
print(data[0])
thanks in adnvace
Upvotes: 4
Views: 3197
Reputation: 5615
So I checked out the page a bunch and it seems quite resistant to selenium's own methods, so we'll have to rely on javascript. Here's the full code-
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver import ActionChains
from selenium import webdriver
chrome_options = webdriver.ChromeOptions()
browser = webdriver.Chrome(options=chrome_options, executable_path='chromedriver.exe')
browser.get('http://insideairbnb.com/melbourne/')
browser.maximize_window()
# Set up a 30 seconds webdriver wait
explicit_wait30 = WebDriverWait(browser, 30)
try:
# Wait for all circles to load
circles = explicit_wait30.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, 'svg.leaflet-zoom-animated > g:nth-child(2) > circle')))
except TimeoutException:
browser.refresh()
data = []
for circle in circles:
# Execute mouseover on the element
browser.execute_script("const mouseoverEvent = new Event('mouseover');arguments[0].dispatchEvent(mouseoverEvent)", circle)
# Wait for the data to appear
listing = explicit_wait30.until(EC.visibility_of_element_located((By.CSS_SELECTOR, '#listingHover')))
# listing now contains the full element list - you can parse this yourself and add the necessary data to `data`
.......
# Close the listing
browser.execute_script("arguments[0].click()", listing.find_element_by_tag_name('button'))
I'm also using css selectors instead of XPATH. Here's how the flow works-
circles = explicit_wait30.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, 'svg.leaflet-zoom-animated > g:nth-child(2) > circle')))
This waits until all the circles are present and extracts them into circles
.
Do keep in mind that the page is very slow to load the circles, so I've set up a try/except
block to refresh the page automatically if it doesn't load within 30 seconds. Feel free to change this however you want
Now we have to loop through all the circles-
for circle in circles:
Next is simulating a mouseover
event on the circle, we'll be using javascript to do this
This is what the javascript will look like (note that circle
refers to the element we will pass from selenium)
const mouseoverEvent = new Event('mouseover');
circle.dispatchEvent(mouseoverEvent)
This is how the script is executed through selenium-
browser.execute_script("const mouseoverEvent = new Event('mouseover');arguments[0].dispatchEvent(mouseoverEvent)", circle)
Now we have to wait for the listing to appear-
listing = explicit_wait30.until(EC.visibility_of_element_located((By.CSS_SELECTOR, '#listingHover')))
Now, you've listing
which is an element that also contains many other elements, you can now extract each element however you want pretty easily and store them inside data
.
If you do not care about extracting each element differently, simply doing .text
on listing
will result in something like this-
'Tanya\n(No other listings)\n23127829\nSerene room for a single person or a couple.\nGreater Dandenong\nPrivate room\n$37 income/month (est.)\n$46 /night\n4 night minimum\n10 nights/year (est.)\n2.7% occupancy rate (est.)\n0.1 reviews/month\n1 reviews\nlast: 20/02/2018\nLOW availability\n0 days/year (0%)\nclick listing on map to "pin" details'
That's it, then you can append the result into data
and you're done!
Upvotes: 1