Thanh Trung
Thanh Trung

Reputation: 3804

Regular expression: Match text node only in a tag

I've been working on a highlight script. The first result can be found here substring selector with jquery?

The script http://jsfiddle.net/TPg9p/3/

But unfortunately it only works with a simple string. I want it to work with string that contain tags inside.

Example :

<li>sample string li span style="color:red" id 
    <span id="toto" style="color:red">color id</span> 
    abcde
</li>

So if the user search for span it should only match the span inside the <li> and before the tag span but not the tag span itself. Then the matched string is replace with <span class="highlight">span</span>The same for other attributes or content of an attributes. Anything inside an opening tag and end tag should be ignored.

Since HTML is about DOM and nodes. Could we parse this string into nodes then select only the text node to replace it?

Please answer by updating the jsFiddle above.

UPDATED

Demo of working solution by Tibos : http://jsfiddle.net/TPg9p/10/

Upvotes: 0

Views: 2837

Answers (2)

raina77ow
raina77ow

Reputation: 106365

Instead of attempting to get the correct string with the regexes, work with textNodes only:

$('#submit').click(function () {
    var replacePattern = new RegExp(
        $('#search').val().replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&'), 
        'gi');
    $('#sample').children().addBack().not('.highlight')
      .contents().filter(function() {
        return this.nodeType === 3;
    }).replaceWith(function(){
        return this.data.replace(replacePattern, '<b class="highlight">$&</b>');
    });
});

Demo.

Explanation: first you collect the #sample element and its descendants (direct only, if children() is used; it's possible to use find(*) as well, of course). Then .highlight elements are filtered out of that selection - it's obviously optional, but it made little sense for me to highlight within something that's already highlighted.

After you have all the elements (to be processed), you collect all their children with .contents() - and filter the collection (with nodeType check) so that only text nodes remain there. Finally, you run .replaceWith() over that collection.

Note that the pattern definition is placed outside of the replaceWith callback function (as it basically should be a constant value during a single click handling).

Upvotes: 1

Tibos
Tibos

Reputation: 27823

Disclaimer: You should use a HTML parser instead of regexp here.

The regular expression you are looking for is this one:

/span(?=[^>]*<)/

Example usage:

var str = '<li>sample string li span style="color:red" id ' + 
    '<span id="toto" style="color:red">color id</span> ' +
    'abcde' +
    '</li>';
var keyword = 'span';
var regexp = new RegExp(keyword + '(?=[^>]*<)');
str.replace(regexp, '<span class="highlight">$&</span>');

The regexp matches your word when it is followed by a < before a >.

EDIT: Seeing how you don't have valid HTML (doesn't start with a tag, end with a tag), you can change your regular expression to also check for the end of the string rather than the begining of a tag:

/span(?=[^>]*(?:<|$))/

DEMO: http://jsfiddle.net/TPg9p/8/

EDIT: Added regexp escaping: .replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&') Curtesy of this answer: Is there a RegExp.escape function in Javascript?

Upvotes: 1

Related Questions