Okku
Okku

Reputation: 7819

JavaScript - Matching links not inside <a> or attribute

I'm trying to create a wysiwyg editor. The goal is that when a user pastes or types in a link (I.E. paste or keyup(space) event), the editor will detect it in real time and discern if it's an image, video or something else.

I tried to work some libraries suggested in an answer for another question, but those insisted on making all url's links or caused other problems.

I was unsure as to what the best approach would be. I tried looping the contents of the input field, which I couldn't get to work with nested elements. So, instead I attempted converting the html contents into a string and then replacing links from that.

The problem is not matching a link, the internet is full of great regexes. But how do I only match links that are not inside an tag, or an attribute of another tag?

I tried adding a negative lookahead (?!(\</a>|"|') (which I know isn't the perfect solution) in the end of the string, but apparently that doesn't work like I thought it would. So I'm completely lost with this.

$(function(){
  document.write(searchLinks("Sample text https://www.google.fi/images/srpr/logo11w.png and http://google.com/ <a href='http://bing.com/'>http://bing.com/</a>"));
});

function searchLinks(string){
	var urlRegex =/\bhttps?:\/\/[a-zA-Z0-9()^=+*@&%#|~?!;:,.-_/]*[-A-Za-z0-9+&@#/%=~_()|](?!(\<\/a\>|"|'))/g;
	console.log(string.match(urlRegex));
	string=string.replace(urlRegex, function(url){
		if(url.match(/\.gifv/)!=null){ //gifv
			return gifvToVideo(url);
		}else if(url.match(/\.(jpeg|jpg|gif|png|svg)/)!=null){ //image
			return "<img src='"+url+"' alt='"+url+"'>";
		}else if(url.match(/\.(mp4|webm)/)!=null){ //video
			return '<video><source src="'+url+'"></video>';
		}else{ //link
			return '<a href="'+url+'" target="_blank">'+url+'</a>';
		}
	});
	return string;
}
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>

Upvotes: 2

Views: 159

Answers (2)

D.A.H
D.A.H

Reputation: 919

Find URL's outside attributes


You may have links in other html elements too.

An option is to search links, which are not inside attributes. Code below isn't bulletproof, but on well formatted HTML this should work on most cases.

If you suspect that your HTML is not well formatted, tidy up it before using regex below.

PHP example:

preg_match_all( "/(?<!\"|')(http|https|ftp|ftps)\\:\\/\\/[a-zA-Z0-9\\-\\.]+\\.[a-zA-Z]{2,3}(\\/\\S*)?/", $srcText, $rgxMatches) ;

Regex:

(?<!"|')(http|https|ftp|ftps)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?

Upvotes: 0

Arun P Johny
Arun P Johny

Reputation: 388316

I think 1 option is to create a dom structure and iterate over only the top level text nodes like

function searchLinks(html) {
    var $tmp = $('<div />', {
        html: html
    });
    var urlRegex = /\bhttps?:\/\/[a-zA-Z0-9()^=+*@&%#|~?!;:,.-_\/]*[-A-Za-z0-9+&@#\/%=~_()|](?!(\<\/a\>|"|'))/g;
    $tmp.contents().each(function () {
        if (this.nodeType == Node.TEXT_NODE) {
            var string = this.nodeValue;

            string = string.replace(urlRegex, function (url) {
                if (url.match(/\.gifv/) != null) { //gifv
                    return gifvToVideo(url);
                } else if (url.match(/\.(jpeg|jpg|gif|png|svg)/) != null) { //image
                    return "<img src='" + url + "' alt='" + url + "'>";
                } else if (url.match(/\.(mp4|webm)/) != null) { //video
                    return '<video><source src="' + url + '"></video>';
                } else { //link
                    return '<a href="' + url + '" target="_blank">' + url + '</a>';
                }
            });

            $(this).replaceWith(string)
        }
    })

    return $tmp.html();
}

Demo: Fiddle

Upvotes: 1

Related Questions