Reputation: 185
I have the below regex to identify text in a html tag that doesn't yields the result expected.
HTML Tag:
<td>Issue Amount</td>
<td>:</td>
<td>20,000,000.00</td>
Find = re.findall(?<=Issue Amount</td> <td>:</td> <td>) [0-9,]),soup_string)[0]
I need to get the numerical value 20,000,000.00 from this tag.
Any advise what am I doing wrong here. I did try couple of other ways but with no success.
Upvotes: 1
Views: 29
Reputation: 185
Below is the regex piece that helped me get the desired output. Thanks all for your inputs.
(?<=Issue Amount[td\W]{21})([\d,.]+)
Upvotes: 0
Reputation: 4013
Do not under any circumstances try to parse XML with a regex unless you wish to invoke rite 666 Ph'nglui mglw'nafh Cthulhu R'lyeh wgah'nagl fhtagn.
Use an HTML parsing library see this page for some ways to do it.
However in your case you have mucked up your regex by looking for a space between your </td>
and <td>
tags. Whereas your data has carriage returns. You can use the \s
meta-character to look for any white space character
Upvotes: 2