Reputation: 1113

Matching hashes using regex, but not when they are part of an url

I am struggling with a regex in javascript that needs the text after # to the first word boundary, but not match it if it is part of an url. So

#test - should match test
sometext#test2 - should match test2
xx moretext#test3 - should match test3
http://test.com#tab1 - should not match tab1

I am replacing the text after the hash with a link (but not the hash character itself). There can be more than one hash in the text, and it should match them all (I guess I should use /g for that).

Matching the part after the hash is quite easy: /#\b(.+?)\b/g, but not matching it if the string itself starts with "http" is something I cannot solve. I should probably use a negative look-around, but I am having problems getting my head around that.

Any help is greatly appreciated!

Upvotes: 1

Answers (3)

David Thomas

Reputation: 253466

As regex is, often (if not always), quite expensive to use, I'd suggest using basic string, and array, methods to determine whether a given set of characters represents an URL (though I'm assuming that all URLS will start with the http string):

$('ul li').each(
    function() {
        var t = $(this).text(),
            words = t.split(/\s+/),
            foundHashes = [],
            word = '';
        for (var i = 0, len = words.length; i < len; i++) {
            word = words[i];
            if (word.indexOf('http') == -1 && word.indexOf('#') !== -1) {
                var match = word.substring(word.indexOf('#') + 1);
                foundHashes.push(match);
            }
        }
        // the following just shows what, if anything, was found
        // and can definitely be safely omitted
        if (foundHashes.length) {
            var newSpan = $('<span />', {
                'class': 'matchedWords'
            }).text(foundHashes.join(', ')).appendTo($(this));
        }
    });

JS Fiddle demo (with some timing information printed to the console).

References:

jQuery:
- appendTo().
- each().
- text().
'Vanilla' JavaScript

Upvotes: 0

elclanrs

Reputation: 94131

Try this regex using a negative lookahead instead since JS doesn't support lookbehinds:

/^(?!http:\/\/).*#\b(.+?)\b/

You may want to check for www too, depending on your conditions.

Edit: Then you can do this:

str = str.replace(re.exec(str)[1], 'replaced!');

http://jsfiddle.net/j7c79/2/

Edit 2: Sometimes a regex alone is not the way to go if it gets too complicated. Try a different approach:

var txt = "asdfgh http://asdf#test1 #test2 woot#test3";

function replaceHashWords(str, rep) {
  var isUrl = /^http/.test(str), result = [];
  !isUrl && str.replace(/#\b(.+?)\b/g, function(a,b){ result.push(b); });
  return str.replace((new RegExp('('+ result.join('|') +')','g')), rep);
}

alert(replaceHashWords(txt, 'replaced!')); 
// asdfgh http://asdf#replaced! #replaced! woot#replaced!

Upvotes: 1

Niet the Dark Absol

Reputation: 324790

This would require a lookbehind, something sadly lacking from JavaScript's capabilities.

However, if your subject string is some HTML and those URLs are in href attributes, you can create a document out of it and search for text nodes, only replacing their nodeValues instead of the whole HTML string.

Upvotes: 0

Matching hashes using regex, but not when they are part of an url

Answers (3)

Related Questions