Lucas Lachance
Lucas Lachance

Reputation: 698

Regex to match only the first occurrence of an html element

Yes yes, I know, "don't parse HTML with Regex". I'm doing this in notepad++ and it's a one-time thing so please bear with me for a moment.

I'm trying to simplify some HTML code by using some more advanced techniques. Notably, I have "inserts" or "callouts" or whatever you call them, in my documentation, indicating "note", "warning" and "technical" short phrases to grab the attention of the reader on important information:

<div class="note">
    <p><strong>Notes</strong>: This icon shows you something that complements 
     the information around it. Understanding notes is not critical but 
     may be helpful when using the product.</p>
</div>
<div class="warning">
    <p><strong>Warnings</strong>: This icon shows information that may 
     be critical when using the product. 
     It is important to pay attention to these warnings.</p>
</div>
<div class="technical">
    <p><strong>Technical</strong>: This icon shows technical information 
     that may require some technical knowledge to understand. </p>
</div>

I want to simplify this HTML into the following:

<div class="box note"><strong>Notes</strong>: This icon shows you something that complements 
     the information around it. Understanding notes is not critical but 
     may be helpful when using the product.</div>
<div class="box warning"><strong>Warnings</strong>: This icon shows information that may 
     be critical when using the product. 
     It is important to pay attention to these warnings.</div>
<div class="box technical"><strong>Technical</strong>: This icon shows technical information 
     that may require some technical knowledge to understand.</div>

I almost have the regex necessary to do a nice global search & replace in my project from notepad++, but it's not picking up "only" the first div, it's picking up all of them - if my cursor is at the beginning of my file, the "select" when I click Find is from the first <div class="something"> up until the last </div>, essentially.

Here's my expression: <div class="(.*[^"])">[^<]*<p>(.*?)<\/p>[^<]*<\/div> (notepad++ "automatically" adds the / / around it, kinda).

What am I doing wrong, here?

Upvotes: 1

Views: 1121

Answers (1)

Alex Shesterov
Alex Shesterov

Reputation: 27575

You have a greedy dot-quantifier while matching the class attribute — that's the evil guy who's causing your problems.

Make it non-greedy: <div class="(.*?[^"])"> or change it to a character class: <div class="([^"]*)">.

Upvotes: 2

Related Questions