Te Uruti Tau
Te Uruti Tau

Reputation: 13

Trying to get Get text out of div class using selenium in python

HTML div class that contains the data I wish to print

enter image description here

<div class="gs_a">LR Binford&nbsp;- American antiquity, 1980 - cambridge.org </div>

This is my code so far :

from selenium import webdriver

def Author (SearchVar):

    driver = webdriver.Chrome("/Users/tutau/Downloads/chromedriver")

    driver.get ("https://scholar.google.com/")

    SearchBox = driver.find_element_by_id ("gs_hdr_tsi")

    SearchBox.send_keys(SearchVar)

    SearchBox.submit()

    At = driver.find_elements_by_css_selector ('#gs_res_ccl_mid > div:nth-child(1) > div.gs_ri > div.gs_a')

    print (At)

Author("dog")

All that comes out when I print is

selenium.webdriver.remote.webelement.WebElement (session="9aa956e2bd51f510dd626f6937b01c0e", element="0.6506218589189958-1")

not the text I am new to selenium Help is appreciated

Upvotes: 0

Views: 4661

Answers (3)

undetected Selenium
undetected Selenium

Reputation: 193328

Seems you were almost there. Perhaps, as per the HTML and your code trials you have shared, you are seeing the desired output.

Explaination

Once the following line of code gets executed:

At = driver.find_elements_by_css_selector ('#gs_res_ccl_mid > div:nth-child(1) > div.gs_ri > div.gs_a')

WebElement At refers to the desired element(single element in your list). In your next step, as you invoked print (At) the WebElement At is printed which is as follows:

selenium.webdriver.remote.webelement.WebElement (session="9aa956e2bd51f510dd626f6937b01c0e", element="0.6506218589189958-1")

Solution

Now, as per your question, if you want to extract the text LR Binford - American antiquity, 1980 - cambridge.org, you have to invoke either of the methods through the element:

So you need to change the line of code from:

print (At)

To either of the following:

  • Using text:

    print(At.text)
    
  • Using get_attribute(attributeName):

    print(At.get_attribute("innerHTML"))
    
  • Your own code with minor adjustments:

    # -*- coding: UTF-8 -*-
    from selenium import webdriver
    
    def Author (SearchVar):
    
        options = webdriver.ChromeOptions() 
        options.add_argument("start-maximized")
        options.add_argument('disable-infobars')
        driver=webdriver.Chrome(chrome_options=options, executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe')
        driver.get ("https://scholar.google.com/")
        SearchBox = driver.find_element_by_name("q")
        SearchBox.send_keys(SearchVar)
        SearchBox.submit()
        At = driver.find_elements_by_css_selector ('#gs_res_ccl_mid > div:nth-child(1) > div.gs_ri > div.gs_a')
        for item in At:
            print(item.text)
    
    Author("dog")
    
  • Console Output:

    …, RJ Marles, LS Pellicore, GI Giancaspro, TL Dog - Drug Safety, 2008 - Springer
    

Upvotes: 1

sudonym
sudonym

Reputation: 4028

Intro

First, I recommend to css-select your target on selenium's page_source using a faster parser.

import lxml
import lxml.html

# put this below SearchBox.submit()

CSS_SELECTOR = '#gs_res_ccl_mid > :nth-child(1) > .gs_ri > .gs_a' # Define css
source = driver.page_source                                       # Get all html
At_raw = lxml.html.document_fromstring(source)                    # Convert
At = At_raw.cssselect(CSS_SELECTOR)                               # Select by CSS

Solution 1

Then, you need to extract the text_content() from your web element and encode it properly.

At = At.text_content().encode('utf-8') # Get text and encode
print At

Solution 2

In case At contains more than one line and unicode, you can also remove those:

At = [l.replace(r'[^\x00-\x7F]+','') for line in At \                 # replace unicode
         for l in line.text_content().strip().encode('utf-8').splitlines() \ # Get text
               if l.strip()]                # only consider if line contains characters
print At

Upvotes: 1

Monika
Monika

Reputation: 732

You are printing the element. Print (At.text) instead of At.

Upvotes: 0

Related Questions