Making xpath more selective? [Web scraping]

I am trying to print off some housing prices and am having trouble using Xpath. Here's my code:

from selenium import webdriver
driver = webdriver.Chrome("my/path/here")

driver.get("https://www.realtor.com/realestateandhomes-search/?pgsz=10")
for house_number in range(1,11):
    try:
        price = driver.find_element_by_xpath("""//*[@id="
{}"]/div[2]/div[1]""".format(house_number))
        print(price.text)
    except:
        print('couldnt find')

I am on this website, trying to print off the housing prices of the first ten houses.

My output is that for all the houses that say "NEW", that gets taken as the price instead of the actual price. But for the bottom two, which don't have that NEW sticker, the actual price is recorded.

How do I make my Xpath selector so it selects the numbers and not NEW?

Upvotes: 0

Answers (3)

thebadguy

Reputation: 2140

Can you try this code:

from selenium import webdriver
driver = webdriver.Chrome()
driver.maximize_window()
driver.get("https://www.realtor.com/realestateandhomes-search/Bladen-County_NC/sby-6/pg-1?pgsz=10")

prices=driver.find_elements_by_xpath('//*[@class="data-price-display"]')

for price in prices:
    print(price.text)

It will print

$39,900
$86,500
$39,500
$40,000
$179,000
$31,000
$104,900
$94,900
$54,900
$19,900

Do let me know if any other details are also required

Upvotes: 0

kerberos

Reputation: 1655

You can write it like this without loading the image, which can increase your fetching speed

from selenium import webdriver
# Unloaded image
chrome_opt = webdriver.ChromeOptions()
prefs = {"profile.managed_default_content_settings.images": 2}
chrome_opt.add_experimental_option("prefs", prefs)
driver = webdriver.Chrome(chrome_options=chrome_opt,executable_path="my/path/here")
driver.get("https://www.realtor.com/realestateandhomes-search/Bladen-County_NC/sby-6/pg-1?pgsz=10")
for house_number in range(1,11):
    try:
        price = driver.find_element_by_xpath('//*[@id="{}"]/div[2]/div[@class="srp-item-price"]'.format(house_number))
        print(price.text)
    except:
        print('couldnt find')

Upvotes: 1

budi

Reputation: 6551

You're on the right track, you've just made an XPath that is too brittle. I would try making it a little more verbose, without relying on indices and wildcards.

Here's your XPath (I used id="1" for example purposes):

//*[@id="1"]/div[2]/div[1]

And here's the HTML (some attributes/elements removed for brevity):

<li id="1">
    <div></div>
    <div class="srp-item-body">
        <div>New</div><!-- this is optional! -->
        <div class="srp-item-price">$100,000</div>
    </div>
</li>

First, replace the * wildcard with the element that you are expecting to contain the id="1". This simply serves as a way to help "self-document" the XPath a little bit better:

//li[@id="1"]/div[2]/div[1]

Next, you want to target the second <div>, but instead of searching by index, try to use the element's attributes if applicable, such as class:

//li[@id="1"]/div[@class="srp-item-body"]/div[1]

Lastly, you want to target the <div> with the price. Since the "New" text was in it's own <div>, your XPath was targeting the first <div> ("New"), not the <div> with the price. Your XPath did however work, if the "New" text <div> did not exist.

We can use a similar method as the previous step, targeting by attribute. This forces the XPath to always target the <div> with the price:

//li[@id="1"]/div[@class="srp-item-body"]/div[@class="srp-item-price"]

Hope this helps!

And so... having said all of that, if you are just interested in the prices and nothing else, this would probably also work :)

for price in driver.find_elements_by_class_name('srp-item-price'):
    print(price.text)

Upvotes: 0

Making xpath more selective? [Web scraping]

Answers (3)

Related Questions