Reputation: 1113
I am struggling with a regex in javascript that needs the text after # to the first word boundary, but not match it if it is part of an url. So
#test - should match test
sometext#test2 - should match test2
xx moretext#test3 - should match test3
http://test.com#tab1 - should not match tab1
I am replacing the text after the hash with a link (but not the hash character itself). There can be more than one hash in the text, and it should match them all (I guess I should use /g for that).
Matching the part after the hash is quite easy: /#\b(.+?)\b/g, but not matching it if the string itself starts with "http" is something I cannot solve. I should probably use a negative look-around, but I am having problems getting my head around that.
Any help is greatly appreciated!
Upvotes: 1
Views: 254
Reputation: 253466
As regex is, often (if not always), quite expensive to use, I'd suggest using basic string, and array, methods to determine whether a given set of characters represents an URL (though I'm assuming that all URLS will start with the http
string):
$('ul li').each(
function() {
var t = $(this).text(),
words = t.split(/\s+/),
foundHashes = [],
word = '';
for (var i = 0, len = words.length; i < len; i++) {
word = words[i];
if (word.indexOf('http') == -1 && word.indexOf('#') !== -1) {
var match = word.substring(word.indexOf('#') + 1);
foundHashes.push(match);
}
}
// the following just shows what, if anything, was found
// and can definitely be safely omitted
if (foundHashes.length) {
var newSpan = $('<span />', {
'class': 'matchedWords'
}).text(foundHashes.join(', ')).appendTo($(this));
}
});
JS Fiddle demo (with some timing information printed to the console).
References:
Upvotes: 0
Reputation: 94131
Try this regex using a negative lookahead instead since JS doesn't support lookbehinds:
/^(?!http:\/\/).*#\b(.+?)\b/
You may want to check for www
too, depending on your conditions.
Edit: Then you can do this:
str = str.replace(re.exec(str)[1], 'replaced!');
Edit 2: Sometimes a regex alone is not the way to go if it gets too complicated. Try a different approach:
var txt = "asdfgh http://asdf#test1 #test2 woot#test3";
function replaceHashWords(str, rep) {
var isUrl = /^http/.test(str), result = [];
!isUrl && str.replace(/#\b(.+?)\b/g, function(a,b){ result.push(b); });
return str.replace((new RegExp('('+ result.join('|') +')','g')), rep);
}
alert(replaceHashWords(txt, 'replaced!'));
// asdfgh http://asdf#replaced! #replaced! woot#replaced!
Upvotes: 1
Reputation: 324790
This would require a lookbehind, something sadly lacking from JavaScript's capabilities.
However, if your subject string is some HTML and those URLs are in href
attributes, you can create a document out of it and search for text nodes, only replacing their nodeValue
s instead of the whole HTML string.
Upvotes: 0