Reputation: 81
I want to get the source code of a website using selenium; find a particular element using BeautifulSoup; and then parse it back into selenium as a selenium.webdriver.remote.webelement object. Like so:
driver.get("www.google.com")
soup = BeautifulSoup(driver.source)
element = soup.find(title="Search")
element = Selenium.webelement(element)
element.click()
How can I achieve this?
Upvotes: 8
Views: 8381
Reputation: 561
A general solution that worked for me is to compute the xpath of the bs4 element, then use that to find the element in selenium,
xpath = xpath_soup(soup_element)
selenium_element = driver.find_element_by_xpath(xpath)
...
import itertools
def xpath_soup(element):
"""
Generate xpath of soup element
:param element: bs4 text or node
:return: xpath as string
"""
components = []
child = element if element.name else element.parent
for parent in child.parents:
"""
@type parent: bs4.element.Tag
"""
previous = itertools.islice(parent.children, 0, parent.contents.index(child))
xpath_tag = child.name
xpath_index = sum(1 for i in previous if i.name == xpath_tag) + 1
components.append(xpath_tag if xpath_index == 1 else '%s[%d]' % (xpath_tag, xpath_index))
child = parent
components.reverse()
return '/%s' % '/'.join(components)
Upvotes: 9
Reputation: 291
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup
driver = webdriver.Chrome()
driver.get("http://www.google.com")
soup = BeautifulSoup(driver.page_source, 'html.parser')
search_soup_element = soup.find(title="Search")
input_element = soup.select('input.gsfi.lst-d-f')[0]
search_box = driver.find_element(by='name', value=input_element.attrs['name'])
search_box.send_keys('Hello World!')
search_box.send_keys(Keys.RETURN)
This pretty much works. I can see reason for working with both webdriver and BeautifulSoup but not necessarily for this example.
Upvotes: 1
Reputation: 180502
I don't know of any way to pass from bs4 to selenium but you can just use selenium to find the element:
driver.find_element_by_xpath('//input[@title="Search"]').click()
Or to find using just the title text like your bs4 find:
driver.find_element_by_xpath('//*[@title="Search"]').click()
Upvotes: 0