Reputation: 23
https://rocketreach.co/horizon-blue-cross-blue-shield-of-new-jersey-email-format_b5c604a3f42e0c54 This is the link I'm trying to get the information out of. I need to extract the formats that's in the table "first '_' last" "first_initial last" and so on. If not all of them, then at least the top most format.
Here's what I have so far:
def search_on_google(key_word, driver):
driver.get("https://www.google.com/")
searchBoard = driver.find_element_by_name('q')
searchBoard.send_keys(key_word + " Rocketreach.co")
searchBoard.send_keys(Keys.TAB)
searchBoard.send_keys(Keys.ENTER)
driver.find_element_by_tag_name("cite").click()
soup = BeautifulSoup(driver.page_source, 'html.parser')
for link in soup.find_all('meta'):
content = link.get('content')
print(content)
Edit:
for i in range(1):
driver.find_element_by_tag_name("cite").click()
soup = BeautifulSoup(driver.page_source, 'html.parser')
WebDriverWait(driver, 10).until(EC.presence_of_element_located(
(By.XPATH, "//table/tbody/tr[1]/td[1][not(contains(text(), 'Lorem ipsum...'))]")))
table_id = driver.find_element(By.TAG_NAME, "tbody")
rows = table_id.find_elements(By.TAG_NAME, "tr")
for row in rows:
tds = row.find_elements(By.TAG_NAME, "td")
top_format.append(tds[0].text)
domain.append(tds[1].text)
print(top_format)
print(domain)
break
return top_format
Upvotes: 0
Views: 472
Reputation: 9969
There's only one table on this page to print all the information you can simply do the following to print all the information. It is also not in any iframes.
WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, "//table/tbody/tr[1]/td[1][not(contains(text(), 'Lorem ipsum...'))]")))
table_id = driver.find_element(By.TAG_NAME, "tbody")
rows = table_id.find_elements(By.TAG_NAME, "tr")
for row in rows:
tds = row.find_elements(By.TAG_NAME, "td")
for td in tds:
one_urls.append(td.text)
print(one_urls)
You could do a check before the print or you could do a range.
if tds[0] =='':
I'd also suggest a wait prior to finding the table since your clicking and loading a new page prior to getting the table.
table_id= WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.TAG_NAME, "tbody")))
Import these
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
Upvotes: 1