Shashi Shankar Singh
Shashi Shankar Singh

Reputation: 185

Unable to accurately search a particular text in a html tag using Python

I have the below regex to identify text in a html tag that doesn't yields the result expected.

HTML Tag:

<td>Issue Amount</td>
<td>:</td>
<td>20,000,000.00</td>

Find = re.findall(?<=Issue Amount</td> <td>:</td> <td>) [0-9,]),soup_string)[0]

I need to get the numerical value 20,000,000.00 from this tag.

Any advise what am I doing wrong here. I did try couple of other ways but with no success.

Upvotes: 1

Views: 29

Answers (2)

Shashi Shankar Singh
Shashi Shankar Singh

Reputation: 185

Below is the regex piece that helped me get the desired output. Thanks all for your inputs.

(?<=Issue Amount[td\W]{21})([\d,.]+)

Upvotes: 0

JGNI
JGNI

Reputation: 4013

Do not under any circumstances try to parse XML with a regex unless you wish to invoke rite 666 Ph'nglui mglw'nafh Cthulhu R'lyeh wgah'nagl fhtagn.

Use an HTML parsing library see this page for some ways to do it.

However in your case you have mucked up your regex by looking for a space between your </td> and <td> tags. Whereas your data has carriage returns. You can use the \s meta-character to look for any white space character

Upvotes: 2

Related Questions