Reputation: 199
I'm using Python+Selenium to scrape data from a site which lists companies' info.
For each company I need 2 data points - email and url.
The problem is - for some companies email is not indicated and if I separately get a list of urls and emails I won't be able to fit the pairs (list of emails will be shorter than list of url and I won't know which of the emails is missing).
So I thought maybe there is a way to get root elements of each of the companies' blocks (say, it is div with class "provider") and then search inside each of them for email and url.
Is it possible and if yes - how?
Upvotes: 1
Views: 336
Reputation: 14135
Here is the complete logic.
url = "https://clutch.co/web-designers?page=0"
driver.get(url)
pros = driver.find_elements_by_css_selector("li.provider-row")
providers =[]
for provider in pros:
pUrl = provider.find_element_by_css_selector(".website-link.website-link-a a").get_attribute("realurl")
if (len(provider.find_elements_by_css_selector(".contact-dropdown .item a"))>0):
pEmail = provider.find_element_by_css_selector(".contact-dropdown .item a").get_attribute('textContent')
else:
pEmail=''
providers.append("{" + pUrl + "," + pEmail + "}")
print(providers)
Upvotes: 4
Reputation: 199
Ok, I found the solution.
First you collect all the blocks with fields you need to get. Example:
providers = browser.find_elements_by_class_name('provider-row')
And then you use find_elements_by_xpath() method with locator starting with ".//" which means search inside a specific element. Example:
providers[0].find_elements_by_xpath(".//li[@class='website-link website-link-a']/a[@class='sl-ext']")
Upvotes: 2
Reputation: 91
There are two ways you can do it.
First: Simply use the selector to find the element in children of that 'div' element. You can use find_elements functions to check how many parent 'divs' are there first, and then loop that many times. This method is not recommended.
Second: You can call find_element family of functions on a webelement object.
Assume that I am working on this website.
### First method:
FirstTitleInDiv = driver.find_element_by_css_selector(".row.test-site:nth-of-type(1) h2") # get first title
SecondTitleInDiv = driver.find_element_by_css_selector(".row.test-site:nth-of-type(2) h2") # get second title
# ... and so on.
### Second method:
Div_Els = driver.find_elements_by_css_selector(".row.test-site") # get list of all divs
# You can now loop through all divs in order to do following:
FirstTitleInDiv = Div_Els[0].find_element_by_css_selector("h2") # get first title
SecondTitleInDiv = Div_Els[1].find_element_by_css_selector("h2") # get second title
# ... and so on.
Upvotes: 4