Reputation: 1187
How do I find elements by class name without repeating the output? I have two class to scrape hdrlnk
and results-price
. I wrote the code like this:
x = driver.find_elements_by_class_name(['hdrlnk','result-price'])
and it gives me some error. I have another code that I tried and here it is:
x = driver.find_elements_by_class_name('hdrlnk'),
y = driver.find_elements_by_class_name('result-price')
for xs in x:
for ys in y:
print(xs.text + ys.text)
But I got the result like this
sony 5 disc cd changer$40
sony 5 disc cd changer$70
sony 5 disc cd changer$70
sony 5 disc cd changer$190
sony 5 disc cd changer$190
sony 5 disc cd changer$190
sony 5 disc cd changer$190
sony 5 disc cd changer$10
The part of the HTML structure that I am trying to scrape
<p class="result-info">
<span class="icon icon-star" role="button" title="save this post in your favorites list">
<span class="screen-reader-text">favorite this post</span>
</span>
<time class="result-date" datetime="2019-11-07 18:20" title="Thu 07 Nov 06:20:56 PM">Nov 7</time>
<a href="https://vancouver.craigslist.org/rch/ele/d/chandeliers/7015824686.html" data-id="7015824686" class="result-title hdrlnk">CHANDELIERS</a>
<span class="result-meta">
<span class="result-price">$800</span>
<span class="result-hood"> (Richmond)</span>
<span class="result-tags">
<span class="pictag">pic</span>
</span>
<span class="banish icon icon-trash" role="button">
<span class="screen-reader-text">hide this posting</span>
</span>
<span class="unbanish icon icon-trash red" role="button" aria-hidden="true"></span>
<a href="#" class="restore-link">
<span class="restore-narrow-text">restore</span>
<span class="restore-wide-text">restore this posting</span>
</a>
</span>
</p>
The first element is repeated but I got the correct value for the second one. How do I correct this error?
Upvotes: 0
Views: 9533
Reputation: 7563
I think you don't need nested loop
, try your iteration by object length, utilize len
method:
x = driver.find_elements_by_class_name('hdrlnk'),
#y = driver.find_elements_by_class_name('result-price')
y = driver.find_elements_by_xpath('//p[@class="result-info"]/span[@class="result-meta"]//span[@class="result-price"]')
print(len(x))
print(len(y))
for i in range(len(x)) :
print(x[i].text + y[i].text)
UPDATE
Actually I just imagine you want to couple member x
with member y
, it will looks like this:
x[0] with y[0]
x[1] with y[1]
etc....
So I'm sure you having same number between x
and y
. Because of that reason I think, I just need x
to represent loop
(although, also you can use y
instead).
If you want to include both of them in the loop
, you can use zip
. Please learn from other answers in this thread.
For xpath
you can see here: Locator Strategies
With copy xpath
from inspect element it will give you absolute path. I don't recommend it, because it is very vulnerable to change.
Please see this thread: Absolute vs Relative Xpath
Upvotes: 3
Reputation: 83567
It looks like you have elements with classes hdrlnk
and result-price
that come in pairs. So you need to iterate the lists in parallel with zip()
:
xs = driver.find_elements_by_class_name('hdrlnk'),
ys = driver.find_elements_by_class_name('result-price')
for x, y in zip(xs, ys):
print(x.text, y.text)
This assumes that the two lists contain the same number of elements in the correct order so that they match up correctly with zip()
. It is probably safer to parse them directly from the HTML by iterating over the parent <p>
elements:
ps = driver.find_elements_by_class_name('result-info')
for p in ps:
x = p.find_element_by_class_name('hdrlnk'),
y = p.find_element_by_class_name('result-price')
print(x.text, y.text)
Upvotes: 2
Reputation: 193298
If your usecase is to use find_elements_by _classname()
a better approach would be to to induce WebDriverWait for the visibility_of_all_elements_located()
and you can use either of the following Locator Strategies:
Using CLASS_NAME
:
items = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CLASS_NAME, "hdrlnk")))
prices = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CLASS_NAME, "result-price")))
for i,j in zip(items, prices):
print(i.text + j.text)
However a canonical approach will be to use either of the following:
CSS_SELECTOR
:
items = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "p.result-info a.hdrlnk")))
prices = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "p.result-info span.result-meta>span.result-price")))
for i,j in zip(items, prices):
print(i.text + j.text)
XPATH
:
items = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//p[@class='result-info']//a[contains(@class, 'hdrlnk')]")))
items = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//p[@class='result-info']//span[@class='result-meta']/span[@class='result-price']")))
for i,j in zip(items, prices):
print(i.text + j.text)
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
Upvotes: 1
Reputation: 25731
.find_elements_by_class_name()
only takes a single class name. What I would suggest is using a CSS selector to do this job, e.g. .hdrlnk .result-price
. The code would look like
prices = driver.find_elements_by_css_selector('.hdrlnk .result-price')
This prints all the prices. If you also want the labels, you will have to write a little more code.
for heading in driver.find_elements_by_css_selector('.hdrlnk'):
print(heading.text)
for price in heading.find_elements_by_xpath('./following::span[@class="result-price"]'):
print(' ' + price.text)
See the docs for all the options to find elements.
CSS selector references:
W3C reference
Selenium Tips: CSS Selectors
Taming Advanced CSS Selectors
Upvotes: 5