Reputation: 731
I am using regular expression to fetch both text1 and text2 in the following html code. Here is what I am using:
/<div\s?class="right-col">[\s\n\S]*<a[\s\n]?[^>]*>@(.*)<\/a>/
but apparently I missed text1, only got text2(here is the link to my problem).
<div class="right-col">
<h1>
<a href="url-link-here" title="title-here">title1</a>
</h1>
<p>some text here</p>
<div class="some-class">
<div class="left">
<span><a href="url-link-here" class="breaking" target="_blank">some text here </a></span>
</div>
<div class="postmeta"><a href="url-link-here" >@text1</a> </div>
</div>
<div class="right-col">
<h1>
<a href="url-link-here" title="title-here">title2</a>
</h1>
<p>some text here</p>
<div class="some-class">
<div class="left">
<span><a href="url-link-here" class="breaking" target="_blank">some text here </a></span>
</div>
<div class="postmeta"><a href="url-link-here" >@text2</a> </div>
</div>
Can you guys tell me what went wrong in my regular expression? Is there a better way to capture both title1, title2 and text1, text2?
Upvotes: 0
Views: 101
Reputation: 186
This is a fairly common issue with regular expressions as they are greedy. [\s\S]* (the \n is not needed) matches for the first '<' and 'a' and since it's greedy it will match those and continue. Adding a ? makes it not greedy and using your link returns both text1 and text2.
The short answer is to replace [\s\n\S]* with [\s\S]*? but as others have mentioned, this is probably not a good use of regular expressions.
Upvotes: 0
Reputation: 227310
Using a regular expression here is not the best way to do it. It's bad practice. You should be using a DOM/XML parser to do this.
I like using PHP's DOMDocument class. Using XPath, we can quickly find the elements you want
$dom = new DOMDocument;
$dom->loadHTML($html);
$xPath = new DOMXPath($dom);
$aTags = $xPath->query('//div[@class="some-class"]//a[starts-with(text(), "@")]');
foreach($aTags as $a){
echo $a->nodeValue;
}
DEMO: http://codepad.viper-7.com/QHOXzH
Upvotes: 2