Reputation: 13
HTML div class that contains the data I wish to print
<div class="gs_a">LR Binford - American antiquity, 1980 - cambridge.org </div>
This is my code so far :
from selenium import webdriver
def Author (SearchVar):
driver = webdriver.Chrome("/Users/tutau/Downloads/chromedriver")
driver.get ("https://scholar.google.com/")
SearchBox = driver.find_element_by_id ("gs_hdr_tsi")
SearchBox.send_keys(SearchVar)
SearchBox.submit()
At = driver.find_elements_by_css_selector ('#gs_res_ccl_mid > div:nth-child(1) > div.gs_ri > div.gs_a')
print (At)
Author("dog")
All that comes out when I print is
selenium.webdriver.remote.webelement.WebElement (session="9aa956e2bd51f510dd626f6937b01c0e", element="0.6506218589189958-1")
not the text I am new to selenium Help is appreciated
Upvotes: 0
Views: 4661
Reputation: 193328
Seems you were almost there. Perhaps, as per the HTML and your code trials you have shared, you are seeing the desired output.
Once the following line of code gets executed:
At = driver.find_elements_by_css_selector ('#gs_res_ccl_mid > div:nth-child(1) > div.gs_ri > div.gs_a')
WebElement At refers to the desired element(single element in your list). In your next step, as you invoked print (At)
the WebElement At is printed which is as follows:
selenium.webdriver.remote.webelement.WebElement (session="9aa956e2bd51f510dd626f6937b01c0e", element="0.6506218589189958-1")
Now, as per your question, if you want to extract the text LR Binford - American antiquity, 1980 - cambridge.org, you have to invoke either of the methods through the element:
text
: Gets the text of the element.get_attribute(attributeName)
: Gets the given attribute or property of the element.So you need to change the line of code from:
print (At)
To either of the following:
Using text
:
print(At.text)
Using get_attribute(attributeName)
:
print(At.get_attribute("innerHTML"))
Your own code with minor adjustments:
# -*- coding: UTF-8 -*-
from selenium import webdriver
def Author (SearchVar):
options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_argument('disable-infobars')
driver=webdriver.Chrome(chrome_options=options, executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe')
driver.get ("https://scholar.google.com/")
SearchBox = driver.find_element_by_name("q")
SearchBox.send_keys(SearchVar)
SearchBox.submit()
At = driver.find_elements_by_css_selector ('#gs_res_ccl_mid > div:nth-child(1) > div.gs_ri > div.gs_a')
for item in At:
print(item.text)
Author("dog")
Console Output:
…, RJ Marles, LS Pellicore, GI Giancaspro, TL Dog - Drug Safety, 2008 - Springer
Upvotes: 1
Reputation: 4028
Intro
First, I recommend to css-select your target on selenium's page_source
using a faster parser.
import lxml
import lxml.html
# put this below SearchBox.submit()
CSS_SELECTOR = '#gs_res_ccl_mid > :nth-child(1) > .gs_ri > .gs_a' # Define css
source = driver.page_source # Get all html
At_raw = lxml.html.document_fromstring(source) # Convert
At = At_raw.cssselect(CSS_SELECTOR) # Select by CSS
Solution 1
Then, you need to extract the text_content()
from your web element and encode it properly.
At = At.text_content().encode('utf-8') # Get text and encode
print At
Solution 2
In case At
contains more than one line and unicode, you can also remove those:
At = [l.replace(r'[^\x00-\x7F]+','') for line in At \ # replace unicode
for l in line.text_content().strip().encode('utf-8').splitlines() \ # Get text
if l.strip()] # only consider if line contains characters
print At
Upvotes: 1