moster67
moster67

Reputation: 850

Can't get a regex pattern to work in Python

I have the following (repeating) HTML text from which I need to extract some values using Python and regular expressions.

<tr>
<td width="35%">Demand No</td>
<td width="65%"><input type="text" name="T1" size="12" onFocus="this.blur()" value="876716001"></td>
</tr>

I can get the first value by using

match_det = re.compile(r'<td width="35.+?">(.+?)</td>').findall(html_source_det)

But the above is on one line. However, I also need to get the second value which is on the line following the first one but I cannot get it to work. I have tried the following, but I won't get a match

match_det = re.compile('<td width="35.+?">(.+?)</td>\n'
                       '<td width="65.+?value="(.+?)"></td>').findall(html_source_det)

Perhaps I am unable to get it to work since the text is multiline, but I added "\n" at the end of the first line, so I thought this would resolve it but it did not.

What I am doing wrong?

The html_source is retrieved downloading it (it is not a static HTML file like outlined above - I only put it here so you could see the text). Maybe this is not the best way in getting the source.

I am obtaining the html_source like this:

new_url = "https://webaccess.site.int/curracc/" + url_details #not a real url
myresponse_det = urllib2.urlopen(new_url)
html_source_det = myresponse_det.read()

Upvotes: 1

Views: 100

Answers (1)

heinst
heinst

Reputation: 8786

Please do not try to parse HTML with regex, as it is not regular. Instead use an HTML parsing library like BeautifulSoup. It will make your life a lot easier! Here is an example with BeautifulSoup:

from bs4 import BeautifulSoup

html = '''<tr>
<td width="35%">Demand No</td>
<td width="65%"><input type="text" name="T1" size="12" onFocus="this.blur()" value="876716001"></td>
</tr>'''

soup = BeautifulSoup(html)
print soup.find('td', attrs={'width': '65%'}).findNext('input')['value']

Or more simply:

print soup.find('input', attrs={'name': 'T1'})['value']

Upvotes: 3

Related Questions