RanaHaroon
RanaHaroon

Reputation: 453

how to get text before the starting of specific html tag using html dom parser PHP

I'm unable to figure out to get text between html tags. in my scenario required text is not wrapped between tags except paragraph tag <p>.

<div class="entry clearfix">
<p>111</p>
<p><img class="alignnone size-medium wp-image-38376" src="1.jpg" alt="Talvar" /></p>
<p><strong>111: </strong>111<br/>
    <strong>111:</strong> 111<br/>
    <strong>111:</strong> 111 111<br/>
    <strong>111: </strong>111<br/>
    <strong>111: </strong>1111
</p>
<p><strong>111</strong></p>
<p>
    <strong>01 &#8211;</strong> data1 <strong><a href="#">Download</a><br/>
    </strong><em>222</em><br/>
    <strong>02 &#8211;</strong> data2 <strong><a href="#">Download</a><br/>
    </strong><em>222</em><br/>
    <strong>03 &#8211;</strong> data3 <strong><a href="#">Download</a><br/>
    </strong><em>222</em><br/>
    <strong>04 &#8211;</strong> data4 <strong><a href="#">Download</a><br/>
    </strong><em>222</em>
</p>
<p><strong>222</strong></p>
<p><strong><a href="" target="_blank">3333</a></strong></p>
<p><strong>eb</strong></p></div>

i need data1, data2, data3, data4. for that i am finding <p> which is number 5 as in array number 4.

    foreach($html->find('div[class="entry"]') as $row){
        $a = $row->find('p',4);
        echo $dt = $a->find('text',1)->plaintext; // returns me only data1
    }

data1, data2, data3, data4 are not between any tags except <p> if i get them through striptags() it returns all texts along with 111, Download, 222 etc. please advise how i can get data series.

Upvotes: 4

Views: 450

Answers (1)

sinisake
sinisake

Reputation: 11318

Not sure about more elegant ways, but this should work too:

foreach($html->find('div[class="entry"]') as $row){
$a = $row->find('p',4);

$str=$a->find('strong');
$em=$a->find('em');

foreach($str as $tag) {

$a=str_replace($tag,'',$a);
$a=str_replace($em,'',$a);


        }

}

echo strip_tags($a,'<br>'); // if you want to keep br tags

So, idea is - remove strong and em tags (and text content inside, including links), inside targeted p, with str_replace, and get the rest. If your HTML structure is like this one you've posted, it should work.

Upvotes: 1

Related Questions