Reputation:
I have a lot of HTML files which have unwanted line-feeds. These break things like inline javascript and formatting within the pages. I want to come up with a way to strip out all line feeds from the pages that do not appear directly after an html tag e.g </div>
. Does anyone know of a regex and/or program that may be able to acheive this?
Upvotes: 1
Views: 489
Reputation: 7640
You can use a negative lookbehind to match the line feeds
<?php
$buffer = file_get_contents('test.html');
// replace all line feeds not preceded by </div>
$buffer = preg_replace('|(?<!</div>)[\r\n]|', "", $buffer);
file_put_contents('test.new.html', $buffer);
?>
see: http://www.regular-expressions.info/lookaround.html
Upvotes: 0
Reputation: 72560
You may be able to use Notepad++'s search/replace function, with a regular expression to catch most of this.
Something like:
([^>])\n(.+)
Replaced with:
\1 \2
Upvotes: 1