Reputation: 2590
I'm merging two HTML files together, and as such, they have duplicate <head> </head>
, <html> </html>
and <body> </body>
tags. Is there a way to get preg_replace remove only the second batch of duplicate tags, so the content of the second document blend together without any problems?
If not with preg_replace, is there another way of doing this?
Conceptual Information:
In this instance, there are two files. There will be more eventually.
Each file starts off like this:
<html>
<head>
<style type='text/css'>
(Template Data)
</style>
</head>
<body>
My script is taking those files (which live in some directory), and creating a NEW file that combines both outputs. However, the result of this is something along the lines of:
<html>
<head>
<style type='text/css'>
(Template Data)
</style>
</head>
<body>
blah blah blah
<html>
<head>
<style type='text/css'>
(Template Data)
</style>
</head>
<body>
blah blah blah 2
This creates duplicate tags. The desired output would be:
<html>
<head>
<style type='text/css'>
(Template Data)
</style>
</head>
<body>
Blah blah blah
Blah blah blah 2
Essentially cutting out the head data for all of the HTML files outside of the first one processed through a while loop
Thanks so much!
Upvotes: 0
Views: 672
Reputation: 63472
You can just apply the tag removal on the second HTML before you merge it, then merge the first HTML with the stripped second HTML.
Here's a pseudocode example if you have more HTMLs to merge:
$strip_tags = false;
foreach ($htmls_to_merge as $html) {
if ($strip_tags) { // this will be false in the first iteration, then true
$html = what you do to strip the tags;
}
merge;
$strip_tags = true;
}
Upvotes: 1
Reputation: 6032
You can try SoftSnow Merger. Not a very hacker-y way of doing things but as long as it works...
Upvotes: 0