Zack Tanner
Zack Tanner

Reputation: 2590

preg_replace (or other) to remove duplicate tags

I'm merging two HTML files together, and as such, they have duplicate <head> </head>, <html> </html> and <body> </body> tags. Is there a way to get preg_replace remove only the second batch of duplicate tags, so the content of the second document blend together without any problems?

If not with preg_replace, is there another way of doing this?

Conceptual Information:

In this instance, there are two files. There will be more eventually.

Each file starts off like this:

<html>
<head>
<style type='text/css'>
(Template Data)
</style>
</head>
<body>

My script is taking those files (which live in some directory), and creating a NEW file that combines both outputs. However, the result of this is something along the lines of:

<html>
<head>
<style type='text/css'>
(Template Data)
</style>
</head>
<body>
blah blah blah
<html>
<head>
<style type='text/css'>
(Template Data)
</style>
</head>
<body>
blah blah blah 2

This creates duplicate tags. The desired output would be:

<html>
<head>
<style type='text/css'>
(Template Data)
</style>
</head>
<body>
Blah blah blah
Blah blah blah 2

Essentially cutting out the head data for all of the HTML files outside of the first one processed through a while loop

Thanks so much!

Upvotes: 0

Views: 672

Answers (2)

rid
rid

Reputation: 63472

You can just apply the tag removal on the second HTML before you merge it, then merge the first HTML with the stripped second HTML.

Here's a pseudocode example if you have more HTMLs to merge:

$strip_tags = false;
foreach ($htmls_to_merge as $html) {
    if ($strip_tags) { // this will be false in the first iteration, then true 
        $html = what you do to strip the tags;
    }
    merge;
    $strip_tags = true;
}

Upvotes: 1

yarian
yarian

Reputation: 6032

You can try SoftSnow Merger. Not a very hacker-y way of doing things but as long as it works...

Upvotes: 0

Related Questions