Myna
Myna

Reputation: 559

Perl regex that matches the first substring specified

I need to extract data from an HTML document and compose an XML document with only interesting information. The way I'm doing this is by transforming the HTML doc into an XML doc, step by step. I have the 5 outermost XML tags in one line each, now I'm trying to structure what's inside of those.

I have a line that's structured this way :

   <myTag> 
      blablabla <a href="link/I/want" *some css* > title I want </a> some other stuff <a href="link that/I/don't/want" *some css*> text I don't want </a> blablabla 
   </myTag>

What I want is :

    <myTag>
    <link>link/I/want</link>
    <title> title I want </title>
    </myTag>

The regex I have is :

    /a href="(.*)"(.*)>(.*)<\/a>/ 

hoping to get #$1 = url , $2 = whatever , $3 = title.

This isn't working because it's taking this instead:

    <myTag>
    <link>link/I/want *some css* > title I want </a> some other stuff <a href="link that/I/don't/want" *some css*</link>
    <titl>text I don't want</title>
    </myTag>

How do I extract the content of the FIRST anchor tag of the line ?

Thanks !

Upvotes: 1

Views: 168

Answers (1)

Igor Chubin
Igor Chubin

Reputation: 64623

Just use non-greedy expressions:

/a href="(.*?)"(.*?)>(.*?)<\/a>/

Note ? after each *.

Upvotes: 3

Related Questions