Reputation: 1149
I am using urllib2 to get a web-page, and I need to look for a specific value within the returned data.
Is the best way to do this by using Beautiful Soup and using the find method or by using a regex to search the data?
Here is a very basic example of the text that is returned by the request:
<html>
<body>
<table>
<tbody>
<tr>
<td>
<div id="123" class="services">
<table>
<tbody>
<tr>
<td style="PADDING-LEFT: 5px"bgcolor="ffffff" class="style8"> Example BLAB BLAB BLAB </td>
<td style="PADDING-LEFT: 5px"bgcolor="ffffff" class="style8"> BLAB BLAB BLAB </td>
<td style="PADDING-LEFT: 5px"bgcolor="ffffff" class="style8"> BLAB BLAB BLAB </td>
</tr>
<tr>
<td style="PADDING-LEFT: 5px"bgcolor="ffffff" class="style8"> BLAB BLAB BLAB </td>
<td style="PADDING-LEFT: 5px"bgcolor="ffffff" class="style8"> BLAB BLAB BLAB </td>
<td style="PADDING-LEFT: 5px"bgcolor="ffffff" class="style8"> BLAB BLAB BLAB </td>
</tr>
<tr>
<td style="PADDING-LEFT: 5px"bgcolor="ffffff" class="style8"> BLAB BLAB BLAB </td>
<td style="PADDING-LEFT: 5px"bgcolor="ffffff" class="style8"> BLAB BLAB BLAB </td>
<td style="PADDING-LEFT: 5px"bgcolor="ffffff" class="style8"> BLAB BLAB BLAB </td>
</tr>
</tbody>
</table>
</div>
</td>
</tr>
</tbody>
</body>
</html>
In this case I want to return "Example BLAB BLAB BLAB". The only thing that remains persistent within this is "Example" and I want to return all of the data within this particular tag.
Upvotes: 0
Views: 90
Reputation: 368894
Don't use regular expression to parse html/xml.
Using BeautifulSoup, you can use css selector:
>>> from bs4 import BeautifulSoup
>>>
>>> html_str = '''
... <html>
... <body>
... <td style="PADDING-LEFT: 5px"bgcolor="ffffff" class="style8"> Example BLAB BLAB BLAB </td>
... <td style="PADDING-LEFT: 5px"bgcolor="ffffff" class="style8"> BLAB BLAB BLAB </td>
... <td style="PADDING-LEFT: 5px"bgcolor="ffffff" class="style8"> BLAB BLAB BLAB </td>
... <td style="PADDING-LEFT: 5px"bgcolor="ffffff" class="style8"> BLAB BLAB BLAB </td>
... </body>
... </html>
... '''
>>> soup = BeautifulSoup(html_str)
>>> for td in soup.select('.style8'):
... print(td.text)
...
Example BLAB BLAB BLAB
BLAB BLAB BLAB
BLAB BLAB BLAB
BLAB BLAB BLAB
Upvotes: 5