mauro
mauro

Reputation: 53

extract text between href tag notepad++

I have this html page:

<div class="abc">
<a href="www...." title="aaaaa">TEXTONE</a>
</div>

<div class="abc">
<a href="www...." title="bbbb">TEXTTWO</a>
</div>

Only the div class are the same, I need to extract TEXTONE and TEXTTWO. How can I do with find function? Thank you

Upvotes: 0

Views: 3307

Answers (4)

csabinho
csabinho

Reputation: 1609

An improvement of vs97s regex would be:([\s\S])*?<a.*?>(.*?)<\/a>([\s\S])*? with \2\n as replacement!

Explanation:

([\s\S])*? takes anything until the next pattern match, ungreedy

<a.*?>(.*?)<\/a> takes an <a[...]>TEXT</a> tag and saves the text

([\s\S])*? ehm...see above! ;-)

If you replace it by \2\n the second match, which is the text of the a-tag, will be placed there, followed by a newline, instead of the tag.

Upvotes: 0

vs97
vs97

Reputation: 5859

The correct way to do this would be to use a parser, but if you want quick and dirty regex to use in Find in Notepad++...

Try the following regex:

\w+(?=<\/a>)            # match all [A-Za-z0-9_] before </a>

Regex Demo

If the text may contain spaces, you can use the following regex:

(?<=>).+(?=<\/a>)

Regex Demo

enter image description here

Upvotes: 4

Toto
Toto

Reputation: 91385

This is matching all text in <a..> tags that are inside <div class="abc">, with or without spaces or linebreaks.

  • Ctrl+F
  • Find what: <div class="abc">\s+<a [^>]+>\K.+?(?=</a>)
  • check Wrap around
  • check Regular expression
  • CHECK . matches newline
  • Find next

Explanation:

<div class="abc">   # literally
\s+                 # 1 or more spaces
<a [^>]+>           # <a...> tag
\K                  # forget all we have seen until this position
.+?                 # 1 or more any character, included newlines
(?=</a>)            # positive lookahead, make sure we have and tag after

Screen capture:

enter image description here

Upvotes: 3

Emma
Emma

Reputation: 27723

I'm guessing that maybe you have some other elements, and probably you want to find/replace, which if that'd be the case, some expression similar to:

(<div class="abc">\s*<a\s+[^>]*>)(.+?)(<\/a>)

might work and your desired output is in $2.

Demo


If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.


Upvotes: 1

Related Questions