omg
omg

Reputation: 139832

What's the most effective way to match out data from <table > with PHP?

For example,to strip out key/value pairs from html like below:

<tr> 
          <td id="td3"  class="td3"  bgcolor="#FFFFFF" colspan="4">■ Related Information </td>

        </tr>
        <tr> 
          <td id="td5" class="td5" width="10%">job title:</td>
          <td id="td5" class="td5" width="90%" colspan="3">Sales Representitive</td>
        </tr>
        <tr> 
          <td id="td5" class="td5" width="10%">Date:</td>

          <td id="td5" class="td5" width="40%">2009-9-15</td>
        </tr>
        <tr> 
          <td id="td5" class="td5" width="10%">Location:</td>

          <td id="td5" class="td5" width="40%">Jiangyin</td>
        </tr>
        <tr> 
          <td id="td5" class="td5" width="10%">Degree:</td>
          <td id="td5" class="td5" width="40%">Bachelor</td>

          <td id="td5" class="td5" width="10%">Major:</td>
          <td id="td5" class="td5" width="40%">No limit</td>
        </tr>
        <tr> 
          <td id="td5" class="td5" width="10%">Sex:</td>
          <td id="td5" class="td5" width="40%">No limit</
        </tr>
        <tr> 
          <td id="td5" class="td5" width="10%">Type:</td>
          <td id="td5" class="td5" width="40%">Fulltime</td>
          <td id="td5" class="td5" width="10%"></td>
          <td id="td5" class="td5" width="40%"></td>
        </tr>

I've been tired of writing long regular expression. Is there an easier way to do this?

Upvotes: 0

Views: 138

Answers (3)

ryeguy
ryeguy

Reputation: 66851

You should try the lesser known PHP Simple HTML DOM Parser. It lets you do stuff like this:

// Create DOM from URL or file
$html = file_get_html('http://www.google.com/');

// Find all images
foreach($html->find('img') as $element)
       echo $element->src . '<br>';

// Find all links
foreach($html->find('a') as $element)
       echo $element->href . '<br>'; 

// Create DOM from string
$html = str_get_html('<div id="hello">Hello</div><div id="world">World</div>');

$html->find('div', 1)->class = 'bar';

$html->find('div[id=hello]', 0)->innertext = 'foo';

echo $html; // Output: <div id="hello">foo</div><div id="world" class="bar">World</div>

Upvotes: 1

bucabay
bucabay

Reputation: 5295

You could use some simple regular expressions:

$values = array();
if (preg_match_all("/<tr>(.*?)<\/tr>/is", $html, $matches)) {
 foreach($matches[1] as $match) {
  if (preg_match_all("/<td[^>]*>([^<]+)<\/td>/is", $match, $tds))
   array_push($values, $tds[1]);
 }
}

var_dump($values);

It is a lot simpler when separate the patterns instead of one single large pattern.

Upvotes: 2

Gumbo
Gumbo

Reputation: 655129

Use an HTML or XML parser like DOMDocument or SimpleXML. Then you can simply traverse the DOM and fetch the data you want.

Upvotes: 5

Related Questions