Reputation: 182
I started playing with python and come across something that should be very simple but I cannot make it work... I had below HTML
<h2 class="sr-only">Available Products</h2>
<div id="productlistcontainer" data-defaultpageno="1" data-descfilter="" class="columns4 columnsmobile2" data-noproductstext="No Products Found" data-defaultsortorder="rank" data-fltrselectedcurrency="GBP" data-category="Category1" data-productidstodisableshortcutbuttons="976516" data-defaultpagelength="100" data-searchtermcategory="" data-noofitemsingtmpost="25">
<ul id="navlist" class="s-productscontainer2">
What I need is to use parser.xpath to get value of data-category element.
Im trying for example:
cgy = xpath('//div["data-category"]')
What Im doing wrong ?
Upvotes: 0
Views: 761
Reputation: 897
Personally I use lxml html
to do my parsing because it is fast and easy to work with in my opinion. I could of shorten up how the category
is actually being extracted but I wanted to show you as much detail as possible so you can understand what is going on.
from lxml import html
def extract_data_category(tree):
elements = [
e
for e in tree.cssselect('div#productlistcontainer')
if e.get('data-category') is not None
]
element = elements[0]
content = element.get('data-category')
return content
response = """
<h2 class="sr-only">Available Products</h2>
<div id="productlistcontainer" data-defaultpageno="1" data-descfilter="" class="columns4 columnsmobile2" data-noproductstext="No Products Found" data-defaultsortorder="rank" data-fltrselectedcurrency="GBP" data-category="Category1" data-productidstodisableshortcutbuttons="976516" data-defaultpagelength="100" data-searchtermcategory="" data-noofitemsingtmpost="25">
<ul id="navlist" class="s-productscontainer2">
"""
tree = html.fromstring(response)
data_category = extract_data_category(tree)
print (data_category)
Upvotes: 1
Reputation: 33384
Try Selenium webdriver
with python.
from selenium import webdriver
driver = webdriver.Chrome()
driver.get("url here")
element=driver.find_element_by_xpath("//div[@id='productlistcontainer']")
print(element.get_attribute('data-category'))
Or you can use Beautifulsoup which is python library.
from bs4 import BeautifulSoup
doc = """
<h2 class="sr-only">Available Products</h2>
<div id="productlistcontainer" data-defaultpageno="1" data-descfilter="" class="columns4 columnsmobile2" data-noproductstext="No Products Found" data-defaultsortorder="rank" data-fltrselectedcurrency="GBP" data-category="Category1" data-productidstodisableshortcutbuttons="976516" data-defaultpagelength="100" data-searchtermcategory="" data-noofitemsingtmpost="25">
<ul id="navlist" class="s-productscontainer2">
"""
soup = BeautifulSoup(doc,'html.parser')
print(soup.select_one('div#productlistcontainer')['data-category'])
Upvotes: 2