Daniel Hanczyc
Daniel Hanczyc

Reputation: 182

python xpath selector from div

I started playing with python and come across something that should be very simple but I cannot make it work... I had below HTML

<h2 class="sr-only">Available Products</h2>
<div id="productlistcontainer" data-defaultpageno="1" data-descfilter="" class="columns4 columnsmobile2" data-noproductstext="No Products Found" data-defaultsortorder="rank" data-fltrselectedcurrency="GBP" data-category="Category1" data-productidstodisableshortcutbuttons="976516" data-defaultpagelength="100" data-searchtermcategory="" data-noofitemsingtmpost="25">
    <ul id="navlist" class="s-productscontainer2">

What I need is to use parser.xpath to get value of data-category element.

Im trying for example:

cgy = xpath('//div["data-category"]')

What Im doing wrong ?

Upvotes: 0

Views: 761

Answers (2)

antfuentes87
antfuentes87

Reputation: 897

Personally I use lxml html to do my parsing because it is fast and easy to work with in my opinion. I could of shorten up how the category is actually being extracted but I wanted to show you as much detail as possible so you can understand what is going on.

from lxml import html

def extract_data_category(tree):
    elements = [
        e
        for e in tree.cssselect('div#productlistcontainer')
        if e.get('data-category') is not None
    ]
    element = elements[0]
    content = element.get('data-category')
    return content

response = """
<h2 class="sr-only">Available Products</h2>
<div id="productlistcontainer" data-defaultpageno="1" data-descfilter="" class="columns4 columnsmobile2" data-noproductstext="No Products Found" data-defaultsortorder="rank" data-fltrselectedcurrency="GBP" data-category="Category1" data-productidstodisableshortcutbuttons="976516" data-defaultpagelength="100" data-searchtermcategory="" data-noofitemsingtmpost="25">
<ul id="navlist" class="s-productscontainer2">
"""

tree = html.fromstring(response)
data_category = extract_data_category(tree)
print (data_category)

Upvotes: 1

KunduK
KunduK

Reputation: 33384

Try Selenium webdriver with python.

from selenium import webdriver
driver = webdriver.Chrome()
driver.get("url here")
element=driver.find_element_by_xpath("//div[@id='productlistcontainer']")
print(element.get_attribute('data-category'))

Or you can use Beautifulsoup which is python library.

from bs4 import BeautifulSoup

doc = """
<h2 class="sr-only">Available Products</h2>
<div id="productlistcontainer" data-defaultpageno="1" data-descfilter="" class="columns4 columnsmobile2" data-noproductstext="No Products Found" data-defaultsortorder="rank" data-fltrselectedcurrency="GBP" data-category="Category1" data-productidstodisableshortcutbuttons="976516" data-defaultpagelength="100" data-searchtermcategory="" data-noofitemsingtmpost="25">
    <ul id="navlist" class="s-productscontainer2">
"""

soup = BeautifulSoup(doc,'html.parser')
print(soup.select_one('div#productlistcontainer')['data-category'])

Upvotes: 2

Related Questions