Reputation:
I'm trying to scrape a webpage using phpsimpledom.
$html = '<div class="namepageheader">
<div class="u">Name: <a href="someurl">Noor Shaad</a>
<div class="u">Age: </div>
</div> '
$name=$html->find('div[class="u"]', 0)->innertext;
$age=$html->find('div[class="u"]', 1)->innertext;
I tried my best to get text from each class="u"
but it didn't work because there is missing closing tag </div>
on first tag <div class="u">
. Can anyone help me out with that....
Upvotes: 0
Views: 162
Reputation: 86
You can find an element close to where the tag should have been closed and then standardize the html by replacing it.
For example, you can replace the </a>
tag by </a></div>
.
str_replace('</a>','</a></div>',$html)
or if there are too many closed </a>
tags , replace </a><div class="u">
with </a></div><div class="u">
str_replace('</a><div class="u">','</a></div><div class="u">',$html)
There may be another problem. There is a gap between the tags and the replacement does not work properly. To solve this problem, you can first delete the spaces between the tags and then replace them.
$html = '<div class="namepageheader">
<div class="u">Name: <a href="someurl">Noor Shaad</a>
<div class="u">Age: </div>
</div> ' ;
$html = preg_replace('~>\\s+<~m', '><', $html);
str_replace('</a><div class="u">','</a></div><div class="u">',$html);
$name=$html->find('div[class="u"]', 0)->innertext;
$age=$html->find('div[class="u"]', 1)->innertext;
Upvotes: 1