Reputation: 1331
How to extract the up vote (215) and Down vote (82) count from the following html snippet using python regular expression?
<span class="vote-actions">
<a class="btn btn-default vote-action-good">
<span class="icon thumb-up black black-hover"> </span>
<span class="rating-inbtn">215</span>
</a>
<a class="btn btn-default vote-action-bad">
<span class="icon thumb-down grey black-hover"> </span>
<span class="rating-inbtn">82</span>
</a>
</span>
I have formatted the html Code but there is no '\n' or '\t' character present in the original code.
FYI i am not expecting any beautiful soup solution. Python Re search function is what I am looking for.
Upvotes: 1
Views: 57
Reputation: 4383
don't use regex to parse html https://stackoverflow.com/a/1732454/412529
here's how to do it with BeautifulSoup:
html = '''<span class="vote-actions">...'''
import bs4
soup = bs4.BeautifulSoup(html)
soup.select("a.vote-action-good span.rating-inbtn")[0].text # '215'
soup.select("a.vote-action-bad span.rating-inbtn")[0].text # '82'
Upvotes: 2
Reputation: 142744
To find both number I would do
text = '''<span class="vote-actions">
<a class="btn btn-default vote-action-good">
<span class="icon thumb-up black black-hover"> </span>
<span class="rating-inbtn">215</span>
</a>
<a class="btn btn-default vote-action-bad">
<span class="icon thumb-down grey black-hover"> </span>
<span class="rating-inbtn">82</span>
</a>
</span>'''
import re
a = re.findall('rating-inbtn">(\d+)', text)
print(a)
['215', '82']
In HTML I see that first number is Up
and second is Down
so I don't need better method.
up = a[0]
down = a[1]
If it is not enough then I would use HTML parser
text = '''<span class="vote-actions">
<a class="btn btn-default vote-action-good">
<span class="icon thumb-up black black-hover"> </span>
<span class="rating-inbtn">215</span>
</a>
<a class="btn btn-default vote-action-bad">
<span class="icon thumb-down grey black-hover"> </span>
<span class="rating-inbtn">82</span>
</a>
</span>'''
import lxml.html
soup = lxml.html.fromstring(text)
up = soup.xpath('//a[@class="btn btn-default vote-action-good"]/span[@class="rating-inbtn"]')
up = up[0].text
print(up)
down = soup.xpath('//a[@class="btn btn-default vote-action-bad"]/span[@class="rating-inbtn"]')
down = down[0].text
print(down)
Upvotes: 2