saastn
saastn

Reputation: 6015

How to extract the hyperlink text from a <a> html tag?

Given a string containing 'blabla <a href="address">text</a> blabla', I want to extract 'text' from it.
regexp doc suggests '<(\w+).*>.*</\1>' expression, but it extracts the whole <a> ... </a> thing.
Of course I can continue using strfind like this:

line = 'blabla <a href="address">text</a> blabla';
atag = regexp(line,'<(\w+).*>.*</\1>','match', 'once');
from = strfind(atag, '>');
to = strfind(atag, '<');
text = atag((from(1)+1):(to(2)-1))

, but, can I use another expression to find text at once?

Upvotes: 1

Views: 1519

Answers (3)

David
David

Reputation: 8308

You can use the extractHTMLText function in Matlab, you can read about it in the following link. Example that get the desired output:

line = 'blabla <a href="address">text</a> blabla';
l = split(extractHTMLText(line), ' ');
l{2}

If you don't want to use a built in function you could use regex as Nick suggested.

line = 'blabla <a href="address">text</a> blabla';
[atag,tok] = regexp(line,'<(\w+).*>(.*?)</\1>','match','tokens'); 
t = tok(1,1){1};
t{2}

and you'll get the desired output

Upvotes: 1

Chin. Udara
Chin. Udara

Reputation: 744

If you are using JQuery try this. No Regex required. But this might negatively impact performance if the DOM is hefty.

$jqueryobj = $(line);
var text = $jqueryobj.find("a").text();

Upvotes: 0

Hamed Ghasempour
Hamed Ghasempour

Reputation: 445

You can simply use a Group.

Update of your pattern will be something like this:

<(\w+).*>(.*)<\/\1>

and this one include all tags:

<.*>(.*)<.*>

Regex101

Upvotes: 1

Related Questions