user1788542
user1788542

Reputation:

regex to scrape data from html table

On one site there is data in form of table. I get its source code like this

<tbody>
    <tr>
        <td></td>
        <td><a href="http://www.altassets.net/ventureforum/" target="_blank">AltAssets Venture Forum</a></td>
        <td>27 March 2014</td>
        <td>London, UK</td>
    </tr>
    <tr>
        <td></td>
        <td>AltAssets Limited Partner Summit</td>
        <td>3-4 June 2014</td>
        <td>London, UK</td>
    </tr>
    <tr>
        <td></td>
        <td>AltAssets Limited Partner Summit</td>
        <td>3-4 June 2014</td>
        <td>London, UK</td>
    </tr>
    <tr>
        <td></td>
        <td>LP-GP Forum: Infrastructure &amp; Real Estate</td>
        <td>7 October 2014</td>
        <td>London, UK</td>
    </tr>
    <tr>
        <td></td>
        <td>Envirotech &amp; Clean Energy Investor Summit</td>
        <td>4-5 November 2014</td>
        <td>London, UK</td>
    </tr>
    <tr>
        <td></td>
        <td>AltAssets Fundraising &amp; IR Forum</td>
        <td>9 December 2014</td>
        <td>Hong Kong</td>
    </tr>
</tbody>

IS it possible to write regex which gives event, date, city separately?

Upvotes: 0

Views: 981

Answers (2)

ntaso
ntaso

Reputation: 614

$matches = array();
preg_match_all("/<tr>(.*)<\/tr>/sU", $s, $matches);
$trs = $matches[1];
$td_matches = array();
foreach ($trs as $tr) {
    $tdmatch = array();
    preg_match_all("/<td>(.*)<\/td>/sU", $tr, $tdmatch);
    $td_matches[] = $tdmatch[1];
}
print_r($td_matches);

Put your string in $s. $td_matches contains a nested array with all TD-contents separated by each TR.

Upvotes: 1

Paul Way
Paul Way

Reputation: 1951

You should be able to use: <td>.+?</td>

Upvotes: 1

Related Questions