Mevia
Mevia

Reputation: 1564

php highlighting html tags whitin loaded html content

I've been working lately on the aplication that allow users to create html template and use it by copying the code. It was all working nice, but files were just too big, with ~300 lines of html code it became difficoult to maintain some sort of order and quickly find portion of code that needed to be replaced/fixed. Ive seen some javascript libraries for highlighting but i dont want that, i wanted to create something simple and php based for my use only.

This is so far what ive got:

<style>
body {
    font-size:30px;
}

.div_tag {
    color:blue;
}
.a_tag {
    color:green;
}
</style>

<body>
<?php
ob_start();
include 'content.php';
$source = ob_get_contents();
ob_end_clean();

$all_lines = explode("\n", $source);

foreach($all_lines as $line) {
    echo preg_replace(array(
        '/<div>/',
        '/<\/div>/',
        '/<a>/',
        '/<\/a>/',
        '/    /',
        '/        /'
    ), array(
        '<span class="div_tag">&lt;div&gt;</span>',
        '<span class="div_tag">&lt;/div&gt;</span>',
        '<span class="a_tag">&lt;a&gt;</span>',
        '<span class="a_tag">&lt;/a&gt;</span>',
        '&nbsp;&nbsp;&nbsp;&nbsp;',
        '&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;'
    ), $line) . '<br />';
}
?>
</body>

for the testing purposes the content.php file looks like this:

<div>
    <div>
        <a>Source</a>
    </div>
</div>

Now, the problems im having, first of all and most important, i Wonder if there is a way to handle tabs, beacuse i have to type literally 4 spaces instead of tab to make it look like there is a tab in the line, classic tab isnt parsed properly and its just gone, there is no tab or spaces in that case which is highly problematic.

Second problem im having is with html tags, in that basic example it works ok but if i do something like <img src="sth" /> or even <a href="sth">sth</a> it obviously breaks, i figured it should be coded in regex more accuratly, but since im just starting to learn regular expressions i dont know how to handle it.

For now i just prepared <div> and <a> but if i understand how to make it more addaptable i will include more like <img>,<span>,<h1,2,3+>,<p> and so on.

Upvotes: 1

Views: 63

Answers (1)

Alex
Alex

Reputation: 14618

Parsing HTML with regular expressions is not right. You'd have to account for a lot of cases, and PHP's regex engine provides support for recursive patterns, however it is a slippery slope when it comes to HTML. The simplest case scenario which accounts for html attributes, is this pattern for the opening tag or the self-closing tag:

'/<(\w)+.*?\/?>/'

A similar approach would work for closing tag:

'/<\/(\w)+>/'

This fails however in these situations:

  • There is the ">" symbol in an attribute value (especially if a javascript event handler)
  • You want to parse the inner HTML as well

A recursive pattern is required if you want to parse the inner html of the html tag, until you see its own closing tag. You can use the reference to the tag name (\w), to look for the closing tag. But it is hell. And even then, with so many languages being output together with HTML, there are cases where even with the most elastic regex flavor, with the best pattern, you won't be able to correctly parse HTML.

However, for highlighting simple HTML, the above will do.

Upvotes: 1

Related Questions