Reputation: 53
I have this html page:
<div class="abc">
<a href="www...." title="aaaaa">TEXTONE</a>
</div>
<div class="abc">
<a href="www...." title="bbbb">TEXTTWO</a>
</div>
Only the div class are the same, I need to extract TEXTONE and TEXTTWO. How can I do with find function? Thank you
Upvotes: 0
Views: 3307
Reputation: 1609
An improvement of vs97s regex would be:([\s\S])*?<a.*?>(.*?)<\/a>([\s\S])*?
with \2\n
as replacement!
Explanation:
([\s\S])*?
takes anything until the next pattern match, ungreedy
<a.*?>(.*?)<\/a>
takes an <a[...]>TEXT</a>
tag and saves the text
([\s\S])*?
ehm...see above! ;-)
If you replace it by \2\n
the second match, which is the text of the a-tag, will be placed there, followed by a newline, instead of the tag.
Upvotes: 0
Reputation: 5859
The correct way to do this would be to use a parser, but if you want quick and dirty regex to use in Find in Notepad++...
Try the following regex:
\w+(?=<\/a>) # match all [A-Za-z0-9_] before </a>
If the text may contain spaces, you can use the following regex:
(?<=>).+(?=<\/a>)
Upvotes: 4
Reputation: 91385
This is matching all text in <a..>
tags that are inside <div class="abc">
, with or without spaces or linebreaks.
<div class="abc">\s+<a [^>]+>\K.+?(?=</a>)
. matches newline
Explanation:
<div class="abc"> # literally
\s+ # 1 or more spaces
<a [^>]+> # <a...> tag
\K # forget all we have seen until this position
.+? # 1 or more any character, included newlines
(?=</a>) # positive lookahead, make sure we have and tag after
Screen capture:
Upvotes: 3
Reputation: 27723
I'm guessing that maybe you have some other elements, and probably you want to find/replace, which if that'd be the case, some expression similar to:
(<div class="abc">\s*<a\s+[^>]*>)(.+?)(<\/a>)
might work and your desired output is in $2
.
If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.
Upvotes: 1