UnuSec
UnuSec

Reputation: 205

How to extract the price from html using regex in python

I have a html output that contains this:

<span class="value">
            Price:<br>
            <span style="color:white">23,07€ </span>
        </span>

I tried to extract the prices using:

prices = re.findall(r'<span class="value">.*?(\d{1,3}\.?\d{1,2}).*?</span>',search_result)

sometimes the decimals are replaced with -- when there are 00, also i need this 2 numbers that get extracted by the expression 23 07 joined 2307

Thank you for your time.

Upvotes: 0

Views: 1249

Answers (1)

Braj
Braj

Reputation: 46841

Get the matched group from index 1.

(?<=>)(\d[^€]*)

demo


OR get the matched group index 1 and 2 for each number

(?<=>)(\d+)\D(\d+)\D

demo


If you are interested only for <span> tag then try below regex

<span [^>]*>(\d+)\D(\d+)\D[^<]*

demo

Sample code:

import re
p = re.compile(ur'<span [^>]*>(\d+)\D(\d+)\D[^<]*')
test_str = u"..."

re.findall(p, test_str)

Upvotes: 1

Related Questions