sagnew
sagnew

Reputation: 111

Remove a href tags from a string using regex

Hey this may have been asked elsewhere somewhere but i couldnt seen to find it.

Essentially im just trying to remove the a tags from a string using regex in javascript.

So i have this:

<a href="www.google.com">This is google</a>

and i want the output to just be "this is google". How would this be done in javascript using regex? Thanks in advance!!

SOLUTION:

Ok so the solution i was provided from my boss goes as follows

The best way to do that is in two parts. One is to remove all closing tags. Then you’re going to want to focus on removing the opening tag. Should be as straightforward as:

/<a\s+.*?>(.*)<\/a>/

With the .*? being the non-greedy version of match /anything/

Upvotes: 1

Views: 1761

Answers (4)

Arunesh Singh
Arunesh Singh

Reputation: 3535

Try this with lookahead.Get the first capturing group.

(?=>).([^<]+)

Check Demo

Upvotes: 0

baao
baao

Reputation: 73221

This shouldn't be done with regex at all, but like this for example:

var a = document.querySelectorAll('a');
var texts = [].slice.call(a).map(function(val){
   return val.innerHTML;
});
console.log(texts);
<a href="www.google.com">this is google</a>

If you only have the a string with multiple <a href...>, you can create an element first

var a_string = '<a href="www.google.com">this is google</a><a href="www.yahoo.com">this is yahoo</a>',
el = document.createElement('p');
el.innerHTML = a_string;
var a = el.querySelectorAll('a');
var texts = [].slice.call(a).map(function(val){
   return val.innerHTML;
});
console.log(texts);

Upvotes: 2

sinisake
sinisake

Reputation: 11318

One more way, with using of capturing groups. So, you basically match all, but grab just one result:

    var re = /<a href=.+>(.+)<\/a>/; 
    var str = '<a href="http://www.somesite.com">this is google</a>';
    var m;

    if ((m = re.exec(str)) !== null) {
        if (m.index === re.lastIndex) {
            re.lastIndex++;
        }

    }
console.log(m[1]);

https://regex101.com/r/rL0bT6/1 Note: code created by regex101.

Demo:http://jsfiddle.net/ry83mhwc/

Upvotes: -1

Leroy
Leroy

Reputation: 247

I don't know your case, but if you're using javascript you might be able to get the inside of the element with innerHTML. So, element.innerHTML might output This is google.

The reasoning is that Regex really isn't meant to parse HTML.

If you really, really want a Regexp, here you go:

pattern = />(.*)</;
string  = '<a href="www.google.com">This is google</a>';
matches = pattern.exec(string);
matches[1] => This is google

This uses a match group to get the stuff inside > and <. This won't work with every case, I guarantee it.

Upvotes: 0

Related Questions