Tony Simonovsky
Tony Simonovsky

Reputation: 199

Selenium+Python. How to locate several elements within a specific element?

I'm using Python+Selenium to scrape data from a site which lists companies' info.

For each company I need 2 data points - email and url.

The problem is - for some companies email is not indicated and if I separately get a list of urls and emails I won't be able to fit the pairs (list of emails will be shorter than list of url and I won't know which of the emails is missing).

So I thought maybe there is a way to get root elements of each of the companies' blocks (say, it is div with class "provider") and then search inside each of them for email and url.

Is it possible and if yes - how?

Upvotes: 1

Views: 336

Answers (3)

supputuri
supputuri

Reputation: 14135

Here is the complete logic.

url = "https://clutch.co/web-designers?page=0"
driver.get(url)
pros = driver.find_elements_by_css_selector("li.provider-row")
providers =[]
for provider in pros:

    pUrl = provider.find_element_by_css_selector(".website-link.website-link-a a").get_attribute("realurl")
    if (len(provider.find_elements_by_css_selector(".contact-dropdown .item a"))>0):
        pEmail = provider.find_element_by_css_selector(".contact-dropdown .item a").get_attribute('textContent')
    else:
        pEmail=''
    providers.append("{" + pUrl + "," +  pEmail + "}")
print(providers)

Upvotes: 4

Tony Simonovsky
Tony Simonovsky

Reputation: 199

Ok, I found the solution.

First you collect all the blocks with fields you need to get. Example:

providers = browser.find_elements_by_class_name('provider-row')

And then you use find_elements_by_xpath() method with locator starting with ".//" which means search inside a specific element. Example:

providers[0].find_elements_by_xpath(".//li[@class='website-link website-link-a']/a[@class='sl-ext']")

Upvotes: 2

Zaif Senpai
Zaif Senpai

Reputation: 91

There are two ways you can do it.

First: Simply use the selector to find the element in children of that 'div' element. You can use find_elements functions to check how many parent 'divs' are there first, and then loop that many times. This method is not recommended.

Second: You can call find_element family of functions on a webelement object.

Assume that I am working on this website.

### First method:
FirstTitleInDiv = driver.find_element_by_css_selector(".row.test-site:nth-of-type(1) h2") # get first title
SecondTitleInDiv = driver.find_element_by_css_selector(".row.test-site:nth-of-type(2) h2") # get second title
# ... and so on.

### Second method:
Div_Els = driver.find_elements_by_css_selector(".row.test-site") # get list of all divs
# You can now loop through all divs in order to do following:
FirstTitleInDiv = Div_Els[0].find_element_by_css_selector("h2") # get first title
SecondTitleInDiv = Div_Els[1].find_element_by_css_selector("h2") # get second title
# ... and so on.

Upvotes: 4

Related Questions