Reputation: 139832
For example,to strip out key/value pairs from html like below:
<tr>
<td id="td3" class="td3" bgcolor="#FFFFFF" colspan="4">■ Related Information </td>
</tr>
<tr>
<td id="td5" class="td5" width="10%">job title:</td>
<td id="td5" class="td5" width="90%" colspan="3">Sales Representitive</td>
</tr>
<tr>
<td id="td5" class="td5" width="10%">Date:</td>
<td id="td5" class="td5" width="40%">2009-9-15</td>
</tr>
<tr>
<td id="td5" class="td5" width="10%">Location:</td>
<td id="td5" class="td5" width="40%">Jiangyin</td>
</tr>
<tr>
<td id="td5" class="td5" width="10%">Degree:</td>
<td id="td5" class="td5" width="40%">Bachelor</td>
<td id="td5" class="td5" width="10%">Major:</td>
<td id="td5" class="td5" width="40%">No limit</td>
</tr>
<tr>
<td id="td5" class="td5" width="10%">Sex:</td>
<td id="td5" class="td5" width="40%">No limit</
</tr>
<tr>
<td id="td5" class="td5" width="10%">Type:</td>
<td id="td5" class="td5" width="40%">Fulltime</td>
<td id="td5" class="td5" width="10%"></td>
<td id="td5" class="td5" width="40%"></td>
</tr>
I've been tired of writing long regular expression. Is there an easier way to do this?
Upvotes: 0
Views: 138
Reputation: 66851
You should try the lesser known PHP Simple HTML DOM Parser. It lets you do stuff like this:
// Create DOM from URL or file
$html = file_get_html('http://www.google.com/');
// Find all images
foreach($html->find('img') as $element)
echo $element->src . '<br>';
// Find all links
foreach($html->find('a') as $element)
echo $element->href . '<br>';
// Create DOM from string
$html = str_get_html('<div id="hello">Hello</div><div id="world">World</div>');
$html->find('div', 1)->class = 'bar';
$html->find('div[id=hello]', 0)->innertext = 'foo';
echo $html; // Output: <div id="hello">foo</div><div id="world" class="bar">World</div>
Upvotes: 1
Reputation: 5295
You could use some simple regular expressions:
$values = array();
if (preg_match_all("/<tr>(.*?)<\/tr>/is", $html, $matches)) {
foreach($matches[1] as $match) {
if (preg_match_all("/<td[^>]*>([^<]+)<\/td>/is", $match, $tds))
array_push($values, $tds[1]);
}
}
var_dump($values);
It is a lot simpler when separate the patterns instead of one single large pattern.
Upvotes: 2
Reputation: 655129
Use an HTML or XML parser like DOMDocument or SimpleXML. Then you can simply traverse the DOM and fetch the data you want.
Upvotes: 5