preg_replace (or other) to remove duplicate tags

Question

I'm merging two HTML files together, and as such, they have duplicate , and tags. Is there a way to get preg_replace remove only the second batch of duplicate tags, so the content of the second document blend together without any problems?

If not with preg_replace, is there another way of doing this?

Conceptual Information:

In this instance, there are two files. There will be more eventually.

Each file starts off like this:

My script is taking those files (which live in some directory), and creating a NEW file that combines both outputs. However, the result of this is something along the lines of:






blah blah blah





blah blah blah 2

This creates duplicate tags. The desired output would be:






Blah blah blah
Blah blah blah 2

Essentially cutting out the head data for all of the HTML files outside of the first one processed through a while loop

Thanks so much!

rid · Accepted Answer

You can just apply the tag removal on the second HTML before you merge it, then merge the first HTML with the stripped second HTML.

Here's a pseudocode example if you have more HTMLs to merge:

$strip_tags = false;
foreach ($htmls_to_merge as $html) {
    if ($strip_tags) { // this will be false in the first iteration, then true 
        $html = what you do to strip the tags;
    }
    merge;
    $strip_tags = true;
}

preg_replace (or other) to remove duplicate tags

Answers (2)

Related Questions