Reputation: 723
I am working on a python project using lxml to scrap a page and I am having the challenge of retrieving the name of a span class attribute. The html snippet is below:
<tr class="nogrid">
<td class="date">12th January 2016</td>
<td class="time">11:22pm</td>
<td class="category">Clothing</td>
<td class="product">
<span class="brand">carlos santos</span>
</td>
<td class="size">10</td>
<td class="name">polo</td>
</tr>
....
How do I retrieve the value of the span's class attribute below:
<span class="brand">carlos santos</span>
Upvotes: 2
Views: 855
Reputation: 89325
You can use the following XPath to get class
attribute of span
element that is direct child of td
with class product
:
//td[@class="product"]/span/@class
working demo example :
from lxml import html
raw = '''<tr class="nogrid">
<td class="date">12th January 2016</td>
<td class="time">11:22pm</td>
<td class="category">Clothing</td>
<td class="product">
<span class="brand">carlos santos</span>
</td>
<td class="size">10</td>
<td class="name">polo</td>
</tr>'''
root = html.fromstring(raw)
span = root.xpath('//td[@class="product"]/span/@class')[0]
print span
output :
Brand
Upvotes: 5
Reputation: 316
from bs4 import BeautifulSoup
lxml = '''<tr class="nogrid">
<td class="date">12th January 2016</td>
<td class="time">11:22pm</td>
<td class="category">Clothing</td>
<td class="product">
<span class="brand">carlos santos</span>
</td>
<td class="size">10</td>
<td class="name">polo</td>
<tr>'''
soup = BeautifulSoup(lxml, 'lxml')
result = soup.find('span')['class'] # result = 'brand'
Upvotes: 1