PythonLearn
PythonLearn

Reputation: 97

Extract <a> content in Python, Selenium Webdriver

I actually make a script, which check auction portal for new interested auctions for me. Now script choose the item name, category, add time and make a list of auctions. Here is start my problem. My code:

#List of auctions
time.sleep(2)
lists= driver.find_elements_by_class_name("vela__item__1FnoI")
print ("Found " + str(len(lists)) + " auctions")

for link in driver.find_elements_by_xpath('//div[@class="vela__item__1FnoI"]//a'):
    print (link.get_attribute('href') + "-" + link.text)

Now it's look horrible:

<selenium.webdriver.firefox.webelement.FirefoxWebElement (session="dae57d0d-9570-4693-bb7f-8aa31ab24699", element="49e4afcd-f6c3-4b62-bba0-a3b21e08c78d")>
<selenium.webdriver.firefox.webelement.FirefoxWebElement (session="dae57d0d-9570-4693-bb7f-8aa31ab24699", element="3f2a9f43-26b8-40f6-a4b6-497d46e41598")> etc
Please help me to achive this result wiev:

http://allegro.pl/doris-wozek-dla-lalek-3f-nosidlo-torba-posciel-15k-i6735944795.html - DORIS WÓZEK DLA LALEK 3F NOSIDŁO TORBA POŚCIEL 15K

http://allegro.pl/sukienka-ubranko-dla-lalki-barbie-de-lux-i6739976160.html - Sukienka ubranko dla lalki Barbie! DE LUX!

HTML search result:

<article class="item__item__2lO83 ">
                    <div class="vela__item__1FnoI">
                        <div class="vela__item__details__1di9R">
                            <div class="photo__thumbnail__1SaYl ">
                                <noscript>
                                    <i><img src="https://1.allegroimg.com/s128/0166b6/964534be46848305f499770a74f1" alt="DORIS WÓZEK DLA LALEK 3F NOSIDŁO TORBA POŚCIEL 15K" /></i>
                                </noscript>
                            </div>
                            <h2 class="header__title__2RWO4">
                                <a href="http://allegro.pl/doris-wozek-dla-lalek-3f-nosidlo-torba-posciel-15k-i6735944795.html">DORIS WÓZEK DLA LALEK 3F NOSIDŁO TORBA POŚCIEL 15K</a>
                            </h2>
                        </div>
                    </div>
                </article><article class="item__item__2lO83 ">
                    <div class="vela__item__1FnoI">
                        <div class="vela__item__details__1di9R">
                            <div class="photo__thumbnail__1SaYl ">
                                <noscript>
                                    <i><img src="https://e.allegroimg.com/s128/0129ef/ec0ceef742ce9cdecbe3465a67fe" alt="Sukienka ubranko dla lalki Barbie! DE LUX!" /></i>
                                </noscript>
                            </div>
                            <h2 class="header__title__2RWO4">
                                <a href="http://allegro.pl/sukienka-ubranko-dla-lalki-barbie-de-lux-i6739976160.html">Sukienka ubranko dla lalki Barbie! DE LUX!</a>
                            </h2>
                        </div>
                    </div>
                </article>

Upvotes: 1

Views: 821

Answers (2)

Andersson
Andersson

Reputation: 52665

You might use below code to exract links and link text:

for link in driver.find_elements_by_xpath('//div[@class="vela__item__1FnoI "]//a'):
    print(link.get_attribute('href') + "-" + link.text)

Upvotes: 1

Guy
Guy

Reputation: 50819

In print (item) you are printing the WebElement to_string() method. To print the text use

print (item.text)

Upvotes: 2

Related Questions