Kunal Dhotre
Kunal Dhotre

Reputation: 43

Using selenium and python to extract data when it pops up after mouse hover

Hi guys this is my first question. I am trying to extract data from a website. But the problem is, it only appears when I hover my mouse over it. the website to the data is http://insideairbnb.com/melbourne/ . I want to extract the occupancy rate for each listing from the panel that pops up when I hover my mouse pointer over the points on the map. I am trying to use @frianH code from this stackoverflow post Scrape website with dynamic mouseover event. I am a newbie in data extraction using selenium. I have knowledge about bs4 package. I havent been successful in finding the right xpath to complete the task. Thank you in advance. my code so far is

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver import ActionChains
from selenium import webdriver

chrome_options = webdriver.ChromeOptions()
browser = webdriver.Chrome(options=chrome_options, executable_path='C:\\Users\\Kunal\\chromedriver.exe')
browser.get('http://insideairbnb.com/melbourne/')
browser.maximize_window()

#wait all circle
elements = WebDriverWait(browser, 20).until(EC.visibility_of_all_elements_located((By.XPATH, '//*[@id="map"]/div[1]/div[2]/div[2]/svg')))
table = browser.find_element_by_class_name('leaflet-zoom-animated')

#move perform -> to table
browser.execute_script("arguments[0].scrollIntoView(true);", table)

data = []
for circle in elements:
    #move perform -> to each circle
    ActionChains(browser).move_to_element(circle).perform()
    # wait change mouseover effect
    mouseover = WebDriverWait(browser, 30).until(EC.visibility_of_element_located((By.XPATH, '//*[@id="neighbourhoodBoundaries"]')))
    data.append(mouseover.text)

print(data[0])

thanks in adnvace

Upvotes: 4

Views: 3197

Answers (1)

Chase
Chase

Reputation: 5615

So I checked out the page a bunch and it seems quite resistant to selenium's own methods, so we'll have to rely on javascript. Here's the full code-

from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver import ActionChains
from selenium import webdriver

chrome_options = webdriver.ChromeOptions()
browser = webdriver.Chrome(options=chrome_options, executable_path='chromedriver.exe')
browser.get('http://insideairbnb.com/melbourne/')
browser.maximize_window()

# Set up a 30 seconds webdriver wait
explicit_wait30 = WebDriverWait(browser, 30)

try:
    # Wait for all circles to load
    circles = explicit_wait30.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, 'svg.leaflet-zoom-animated > g:nth-child(2) > circle')))
except TimeoutException:
    browser.refresh()

data = []
for circle in circles:
    # Execute mouseover on the element
    browser.execute_script("const mouseoverEvent = new Event('mouseover');arguments[0].dispatchEvent(mouseoverEvent)", circle)
    # Wait for the data to appear
    listing = explicit_wait30.until(EC.visibility_of_element_located((By.CSS_SELECTOR, '#listingHover')))
    # listing now contains the full element list - you can parse this yourself and add the necessary data to `data`
    .......
    # Close the listing
    browser.execute_script("arguments[0].click()", listing.find_element_by_tag_name('button'))

I'm also using css selectors instead of XPATH. Here's how the flow works-

circles = explicit_wait30.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, 'svg.leaflet-zoom-animated > g:nth-child(2) > circle')))

This waits until all the circles are present and extracts them into circles.

Do keep in mind that the page is very slow to load the circles, so I've set up a try/except block to refresh the page automatically if it doesn't load within 30 seconds. Feel free to change this however you want

Now we have to loop through all the circles-

for circle in circles:

Next is simulating a mouseover event on the circle, we'll be using javascript to do this

This is what the javascript will look like (note that circle refers to the element we will pass from selenium)

const mouseoverEvent = new Event('mouseover');
circle.dispatchEvent(mouseoverEvent)

This is how the script is executed through selenium-

browser.execute_script("const mouseoverEvent = new Event('mouseover');arguments[0].dispatchEvent(mouseoverEvent)", circle)

Now we have to wait for the listing to appear-

listing = explicit_wait30.until(EC.visibility_of_element_located((By.CSS_SELECTOR, '#listingHover')))

Now, you've listing which is an element that also contains many other elements, you can now extract each element however you want pretty easily and store them inside data.

If you do not care about extracting each element differently, simply doing .text on listing will result in something like this-

'Tanya\n(No other listings)\n23127829\nSerene room for a single person or a couple.\nGreater Dandenong\nPrivate room\n$37 income/month (est.)\n$46 /night\n4 night minimum\n10 nights/year (est.)\n2.7% occupancy rate (est.)\n0.1 reviews/month\n1 reviews\nlast: 20/02/2018\nLOW availability\n0 days/year (0%)\nclick listing on map to "pin" details'

That's it, then you can append the result into data and you're done!

Upvotes: 1

Related Questions