Dmitry Makovetskiyd
Dmitry Makovetskiyd

Reputation: 7053

Scraping with simple_html_dom

I am trying to scrape this:

<a id="pa1">Site1</a>
<font size="-1">Text1</font><br />
<font size="-1" color="green">Text2</font><br />

I get get to pa1 easily..but I want to get to the two fonts that come after.. So I used this:

$html = new simple_html_dom();
$html->load($document);

foreach ($html->find('#pa1>font') as $e) {
    $this->check_line_two = $this->process_array_elements($e->innertext);
}

foreach ($html->find('#pa1>font>font') as $e) {
    $this->check_line_three = $this->process_array_elements($e->innertext);
}

Both didn't work. How can I get the next element with simple html dom?

Upvotes: 0

Views: 319

Answers (3)

Awemo
Awemo

Reputation: 895

If that is all you are trying to scrap, why don't you just select the font tag.

foreach ($html->find('font') as $e) {
    $this->check_line_two = $this->process_array_elements($e->innertext);
}

Or is there a possibility that more font tags are present in the document?

Upvotes: 0

feeela
feeela

Reputation: 29932

There is no descendant font-tag within #pa1.

What you are obviously searching for is the sibling selector +: #pa1 + font. But I don't know if it is supported by the library you are using.

Please read their documentation: http://simplehtmldom.sourceforge.net/manual.htm

Upvotes: 2

Daniel Sloof
Daniel Sloof

Reputation: 12706

Like feeela said, those font elements are not descendants of the anchor. Try something like this:

foreach ($html->find('#pa1') as $e) {
    $firstFontElement = $e->next_sibling();
}

Upvotes: 2

Related Questions