Marius
Marius

Reputation: 31

HTML DOM Parser - getting plain text

Hello I have a script which gets html data from a website...

//website is built like this:

<table class="table table-hover">
<tr>
    <td><b>Cover</b></td>
    <td><b>Platz</b></td>
    <td><b>Titel</b></td>
    <td><b>Videolink</b></td>
</tr>
<tr>
    <td><a href="http://www.youtube.com" target="_blank"><img src="youtube.jpg" /></a></td>
    <td>1</td>
    <td><a href="http://www.youtube.com" target="_blank">name</a></td>
    <td><input type="text" onclick="this.select()" id="1" size="45" name="1" value="http://www.youtube.com" /></td>
</tr><tr>
<td><a href="http://www.youtube.com2" target="_blank"><img src="youtube.jpg2" /></a></td>
    <td>1</td>
    <td><a href="http://www.youtube.com2" target="_blank">name2</a></td>
    <td><input type="text" onclick="this.select()" id="2" size="45" name="2" value="http://www.youtube.com2" /></td>
 </tr></table>

PHP

<?php

include 'core/functions/dom.php'; 
include 'core/init.php'; 

$url = "http://MYWEBSITE";
$html = file_get_html($url);

$theData = array();

foreach($html->find('table tr') as $row) {

$rowData = array();
foreach($row->find('td') as $cell) {

    $rowData[] = $cell->innertext;
}

$theData[] = $rowData;
}
$list=($theData[2]);
$name=($list[3]);
echo $name;

?>

The data is now stored in a variable! but when I echo it out it is a link...

<a href="http://www.youtube.com2" target="_blank">name2</a>

(you can see this when you view the source code)

I just need the "name2" as text, that I can put it in my database!

Another problem is that it echos out a text field. There I also just need the text...

<input type="text" onclick="this.select()" id="2" size="45" name="2" value="http://www.youtube.com2" />

There I need the value of the input as text for my database!

Upvotes: 2

Views: 312

Answers (1)

Kitson88
Kitson88

Reputation: 2950

You can acheive this by using a built in class called DOMDocument. After instantiating your object, you can call the method getElementsByTagName('td') which will extract value data (non-tag data) from the <td> tag. I've added an if conditon to ignore whitespace as some of the <td> tags do not have values.

Code:

<?php

$dom = new DOMDocument;
$dom->loadHTML($html);

$result = $dom->getElementsByTagName('a');

foreach ($result as $v) {

    echo $v->getAttribute('href') . ' ' . $v->nodeValue;
    echo '<br>';

}

Output:

http://www.youtube.com
http://www.youtube.com name
http://www.youtube.com2
http://www.youtube.com2 name2

See: http://php.net/manual/en/domdocument.getelementsbytagname.php

Edit:

I've updated code so it outputs URL's/Anchors & values (if any) of the A tag.

Upvotes: 1

Related Questions