Reputation: 6015
Given a string containing 'blabla <a href="address">text</a> blabla'
, I want to extract 'text'
from it.
regexp doc suggests '<(\w+).*>.*</\1>'
expression, but it extracts the whole <a> ... </a>
thing.
Of course I can continue using strfind
like this:
line = 'blabla <a href="address">text</a> blabla';
atag = regexp(line,'<(\w+).*>.*</\1>','match', 'once');
from = strfind(atag, '>');
to = strfind(atag, '<');
text = atag((from(1)+1):(to(2)-1))
, but, can I use another expression to find text
at once?
Upvotes: 1
Views: 1519
Reputation: 8308
You can use the extractHTMLText
function in Matlab, you can read about it in the following link.
Example that get the desired output:
line = 'blabla <a href="address">text</a> blabla';
l = split(extractHTMLText(line), ' ');
l{2}
If you don't want to use a built in function you could use regex as Nick suggested.
line = 'blabla <a href="address">text</a> blabla';
[atag,tok] = regexp(line,'<(\w+).*>(.*?)</\1>','match','tokens');
t = tok(1,1){1};
t{2}
and you'll get the desired output
Upvotes: 1
Reputation: 744
If you are using JQuery try this. No Regex required. But this might negatively impact performance if the DOM is hefty.
$jqueryobj = $(line);
var text = $jqueryobj.find("a").text();
Upvotes: 0
Reputation: 445
You can simply use a Group
.
Update of your pattern will be something like this:
<(\w+).*>(.*)<\/\1>
and this one include all tags:
<.*>(.*)<.*>
Upvotes: 1