Reputation: 51
I'm trying to extract a value from a html table using bs4
, however the structure of the table is in the form of:
<td class="celda400" vAlign="center" align="right" width="100" bgColor="#DFEDFF" style="color:Black">
575,42
</td>
The value I'm interested in is 575,42
, however it has no id or other identifier to be used by bs4
to be extracted.
How can I call this value? Or under what id?
Upvotes: 2
Views: 174
Reputation: 1560
You can try it. I think, you can understand it:
from bs4 import BeautifulSoup
html_doc = """
<td class="celda400" vAlign="center" align="right" width="100" bgColor="#DFEDFF" style="color:Black">
575,42
</td>
<td class="celda400" vAlign="center" align="right" width="100" bgColor="#DFEDFF" style="color:Black">
875,42
</td>
"""
soup = BeautifulSoup(html_doc, 'lxml')
all_td = soup.find_all('td', {'class':"celda400"})
for td in all_td:
value = td.text.strip()
print(value)
Upvotes: 1
Reputation: 2469
Another solution.
from simplified_scrapy import SimplifiedDoc,req,utils
html = '''
<td class="celda400" vAlign="center" align="right" width="100" bgColor="#DFEDFF" style="color:Black">
575,42
</td>
<td class="celda400" vAlign="center" align="right" width="100" bgColor="#DFEDFF" style="color:Black">
575,43
</td>
'''
doc = SimplifiedDoc(html)
texts = doc.selects('td.celda400').text
print (texts)
Result:
['575,42', '575,43']
Here are more examples. https://github.com/yiyedata/simplified-scrapy-demo/blob/master/doc_examples
Upvotes: 1
Reputation: 581
You can use any of the attributes to extract. For example, to use the
class = "celda400"
attribute
response.find('td', {'class':"celda400"}).string
Upvotes: 1