pythonpython-3.xselenium-webdriverweb-scraping

Reputation: 1187

Selenium find element by class name two parameters

How do I find elements by class name without repeating the output? I have two class to scrape hdrlnk and results-price. I wrote the code like this:

x = driver.find_elements_by_class_name(['hdrlnk','result-price'])

and it gives me some error. I have another code that I tried and here it is:

x = driver.find_elements_by_class_name('hdrlnk'),
y = driver.find_elements_by_class_name('result-price')
for xs in x:
    for ys in y:
        print(xs.text + ys.text)

But I got the result like this

sony 5 disc cd changer$40
sony 5 disc cd changer$70
sony 5 disc cd changer$70
sony 5 disc cd changer$190
sony 5 disc cd changer$190
sony 5 disc cd changer$190
sony 5 disc cd changer$190
sony 5 disc cd changer$10

The part of the HTML structure that I am trying to scrape

<p class="result-info">
    <span class="icon icon-star" role="button" title="save this post in your favorites list">
        <span class="screen-reader-text">favorite this post</span>
    </span>
    <time class="result-date" datetime="2019-11-07 18:20" title="Thu 07 Nov 06:20:56 PM">Nov  7</time>
    <a href="https://vancouver.craigslist.org/rch/ele/d/chandeliers/7015824686.html" data-id="7015824686" class="result-title hdrlnk">CHANDELIERS</a>
    <span class="result-meta">
        <span class="result-price">$800</span>
        <span class="result-hood"> (Richmond)</span>
        <span class="result-tags">
            <span class="pictag">pic</span>
        </span>
        <span class="banish icon icon-trash" role="button">
            <span class="screen-reader-text">hide this posting</span>
        </span>
        <span class="unbanish icon icon-trash red" role="button" aria-hidden="true"></span>
        <a href="#" class="restore-link">
            <span class="restore-narrow-text">restore</span>
            <span class="restore-wide-text">restore this posting</span>
        </a>
    </span>
</p>

The first element is repeated but I got the correct value for the second one. How do I correct this error?

Upvotes: 0

Answers (4)

frianH

Reputation: 7563

I think you don't need nested loop, try your iteration by object length, utilize len method:

x = driver.find_elements_by_class_name('hdrlnk'),
#y = driver.find_elements_by_class_name('result-price')
y = driver.find_elements_by_xpath('//p[@class="result-info"]/span[@class="result-meta"]//span[@class="result-price"]')

print(len(x))
print(len(y))

for i in range(len(x)) :
    print(x[i].text + y[i].text)

UPDATE

Actually I just imagine you want to couple member x with member y, it will looks like this:

x[0] with y[0]
x[1] with y[1]
etc....

So I'm sure you having same number between x and y. Because of that reason I think, I just need x to represent loop (although, also you can use y instead).

If you want to include both of them in the loop, you can use zip. Please learn from other answers in this thread.

For xpath you can see here: Locator Strategies

With copy xpath from inspect element it will give you absolute path. I don't recommend it, because it is very vulnerable to change.

Please see this thread: Absolute vs Relative Xpath

Upvotes: 3

Code-Apprentice

Reputation: 83567

It looks like you have elements with classes hdrlnk and result-price that come in pairs. So you need to iterate the lists in parallel with zip():

xs = driver.find_elements_by_class_name('hdrlnk'),
ys = driver.find_elements_by_class_name('result-price')
for x, y in zip(xs, ys):
    print(x.text, y.text)

This assumes that the two lists contain the same number of elements in the correct order so that they match up correctly with zip(). It is probably safer to parse them directly from the HTML by iterating over the parent <p> elements:

ps = driver.find_elements_by_class_name('result-info')
for p in ps:
    x = p.find_element_by_class_name('hdrlnk'),
    y = p.find_element_by_class_name('result-price')
    print(x.text, y.text)

Upvotes: 2

undetected Selenium

Reputation: 193298

If your usecase is to use find_elements_by _classname() a better approach would be to to induce WebDriverWait for the visibility_of_all_elements_located() and you can use either of the following Locator Strategies:

Using CLASS_NAME:

items = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CLASS_NAME, "hdrlnk")))
prices = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CLASS_NAME, "result-price")))
for i,j in zip(items, prices):
    print(i.text + j.text)

However a canonical approach will be to use either of the following:

CSS_SELECTOR:

items = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "p.result-info a.hdrlnk")))
prices = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "p.result-info span.result-meta>span.result-price")))
for i,j in zip(items, prices):
    print(i.text + j.text)

XPATH:

items = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//p[@class='result-info']//a[contains(@class, 'hdrlnk')]")))
items = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//p[@class='result-info']//span[@class='result-meta']/span[@class='result-price']")))
for i,j in zip(items, prices):
    print(i.text + j.text)

Note : You have to add the following imports :

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

Upvotes: 1

JeffC

Reputation: 25731

.find_elements_by_class_name() only takes a single class name. What I would suggest is using a CSS selector to do this job, e.g. .hdrlnk .result-price. The code would look like

prices = driver.find_elements_by_css_selector('.hdrlnk .result-price')

This prints all the prices. If you also want the labels, you will have to write a little more code.

for heading in driver.find_elements_by_css_selector('.hdrlnk'):
    print(heading.text)
    for price in heading.find_elements_by_xpath('./following::span[@class="result-price"]'):
        print('  ' + price.text)

See the docs for all the options to find elements.

CSS selector references:
W3C reference
Selenium Tips: CSS Selectors
Taming Advanced CSS Selectors

Upvotes: 5

Selenium find element by class name two parameters

Answers (4)

Related Questions