Nick Ryall
Nick Ryall

Reputation:

Remove unwanted line feeds from an HTML file

I have a lot of HTML files which have unwanted line-feeds. These break things like inline javascript and formatting within the pages. I want to come up with a way to strip out all line feeds from the pages that do not appear directly after an html tag e.g </div>. Does anyone know of a regex and/or program that may be able to acheive this?

Upvotes: 1

Views: 489

Answers (2)

Lance Rushing
Lance Rushing

Reputation: 7640

You can use a negative lookbehind to match the line feeds

<?php

$buffer = file_get_contents('test.html');

// replace all line feeds not preceded by </div>
$buffer = preg_replace('|(?<!</div>)[\r\n]|', "", $buffer);

file_put_contents('test.new.html', $buffer);
?>

see: http://www.regular-expressions.info/lookaround.html

Upvotes: 0

DisgruntledGoat
DisgruntledGoat

Reputation: 72560

You may be able to use Notepad++'s search/replace function, with a regular expression to catch most of this.

Something like:

([^>])\n(.+)

Replaced with:

\1 \2

Upvotes: 1

Related Questions