Braden Fenlong
Braden Fenlong

Reputation: 25

Scraping Color of Product with selenium python

enter image description here

link to full source code:

http://www.supremenewyork.com/shop/all/sweatshirts

Trying to scrape both the product element and the color of it from the site. I can already pull the name of the product and click that however I want to be able to pull all the products with that certain keyword in it, and then click the one in the color that i want. any help is appreciated.

Edit: what ive tried,

product = driver.find_elements_by_partial_link_text(keyword)
for item in product:
    if item.parent.parent.find("p") == wanted_color:
        item.get_attribute("href")

Error:

Traceback (most recent call last):   File "C:/Users/B/PycharmProjects/BasicSelenium/test.py", line 17, in <module>
if item.parent.parent.find("p") == color:  AttributeError: 'WebDriver' object has no attribute 'parent'

Upvotes: 1

Views: 800

Answers (2)

JeffC
JeffC

Reputation: 25686

For something like this I would write a function that takes in a keyword and a color name. You can take those values and insert them into a single XPath and click on the A tag that is returned.

def select_product(keyword, color)
    driver.find_element_by_xpath("//article//a[contains(., '" + keyword + "')]/../../p/a[contains(., '" + color + "')]").click()

You would call it like

select_product("Geto Boys", "Ash Grey")

Some quick XPath info

// means any depth vs / which means child (one level down)

a[contains(.,"some text")] means find an A tag that contains the text, "some text". The . in the contains() is a shortcut for text() which just means text contained in the element.

/.. means go up one level

So putting this all together, it reads find an ARTICLE tag at any level that has a descendant (any level) A tag that contains the keyword text that has a parent (two levels up) that has a P child that has an A child that contains the color text.

XPath is a programming language unto itself. You'd be better off reading an XPath guide.

Side note... I would suggest that you favor finding elements in this order:

  1. by ID
  2. by CSS selector

...then if you can't find it either of those ways, you fall back to XPath to locate elements by contained text. XPath are slower and not as well supported as CSS selectors. I used it in this case because you needed to find an element based on the contained text or I would have used a CSS selector.

Upvotes: 1

brennan
brennan

Reputation: 3493

Here's one way:

from selenium import webdriver

browser = webdriver.Chrome()
browser.get(url)
anchors = browser.find_elements_by_class_name('name-link') 

This gets us a list of alternating tags like this:

<h1><a class="name-link" href="/shop/blahblah">Very Cool Sweatshirt</a></h1>
<p><a class="name-link" href="/shop/blahblah">Red</a></p>  

We can split the list into pairs and extract text as needed:

products = [anchors[i:i+n] for i in range(0, len(anchors), n)]                   
for item in products:
        element, description, color = item[0], item[0].text, item[1].text

Or we can filter for things using parent tag_name:

products = []
for element in anchors:
    if element.find_element_by_xpath('..').tag_name == 'p':  # or 'h1'
        text = element.text
        products.append([element, text])

Upvotes: 0

Related Questions