dceccon
dceccon

Reputation: 451

remove HTML tag by content

I have this table in output from a program (string converted in a DomDocument in PHP):

<table>
    <tr>
        <td width="50">Â </td>
        <td>My content</td>
        <td width="50">Â </td>
    </tr>
<table>

I need to remove the two tag <td width="50">Â </td> (i don't know why the program adds them, but there are -.-") like this:

<table>
    <tr>
        <td>My content</td>
    </tr>
<table>

What's the best way for do it in PHP?

Edit: the program is JasperReport Server. I call the report rendering function via web application:

//this is the call to server library for generate the report
$reportGen = $reportServer->runReport($myReport);

$domDoc = new \DomDocument();
$domDoc->loadHTML($reportGen);
return $domDoc->saveHTML($domDoc->getElementsByTagName('table')->item(0));

return the upper table who i need to fix...

Upvotes: 0

Views: 209

Answers (2)

Vegeta
Vegeta

Reputation: 1317

Try this

<?php
    $domDoc = new DomDocument();
    $domDoc->loadHTML($reportGen);
    $xpath = new DOMXpath($domDoc);
    $tags = $xpath->query('//td');
    foreach($tags as $tag) {
        $value = $tag->nodeValue;
        if(preg_match('/^(Â )/',$value))
        $tag->parentNode->removeChild($tag);
    }
?>

Upvotes: 1

Rafael Soufraz
Rafael Soufraz

Reputation: 964

Regex and replace:

$var = '<table>
    <tr>
        <td width="50">Ã</td>
        <td>My interssing content</td>
        <td width="50">Ã</td>
    </tr>
<table>';

$final = preg_replace('#(<td width="50".*?>).*?(</td>)#', '$1$2', $var);
$final = str_replace('<td width="50"></td>', '', $final);

echo $final;

Upvotes: 0

Related Questions