How to extract data from two similar html class elements using a regular expression?

Question

How to extract the up vote (215) and Down vote (82) count from the following html snippet using python regular expression?

I have formatted the html Code but there is no ' ' or ' ' character present in the original code.

FYI i am not expecting any beautiful soup solution. Python Re search function is what I am looking for.

furas · Accepted Answer

To find both number I would do

text = '''
    
         
        215
    
    
         
        82
    
'''

import re

a = re.findall('rating-inbtn">(\d+)', text)
print(a)

['215', '82']

In HTML I see that first number is Up and second is Down so I don't need better method.

up = a[0]
down = a[1]

If it is not enough then I would use HTML parser

text = '''
    
         
        215
    
    
         
        82
    
'''

import lxml.html

soup = lxml.html.fromstring(text)

up = soup.xpath('//a[@class="btn btn-default vote-action-good"]/span[@class="rating-inbtn"]')
up = up[0].text
print(up)

down = soup.xpath('//a[@class="btn btn-default vote-action-bad"]/span[@class="rating-inbtn"]')
down = down[0].text
print(down)

How to extract data from two similar html class elements using a regular expression?

Answers (2)

Related Questions