Reputation: 858
i have data like so
$data = '<a href="not important"><span class="theclass">data (not important)</span></a> <span class="anotherclass">extra data (October 1, 2010)</span>';
i want to get the date within the braces so ive done the following preg_match
preg_match("/\((([a-zA-Z]{5,10} .*?)|(\d{4}))\)/i",$data,$res);
please not that sometimes 'October 1' is not present BUT THE YEAR IS ALWAYS PRESENT hence the OR condition.... the thing is it gives me array of 3 in this case, i know its because of the set of 3 braces i have for each condition , is there any other better and cleaner way to achieve this ?
2nd condition method
$data = <a href="not important"><span class="theclass">data</span></a> <span class="theother">data <a href="not importand">data</a> (2009)</span>
</h3>
Thanks guys
Upvotes: 1
Views: 276
Reputation: 7880
Use lookarounds
Here we're making sure there is a preceding (
character, then we look for text we would see in a date formatted like your example. This little bit of code says ALLOW for alpha numeric characters, a literal space character, and a comma, as well as digits ([A-Za-z ,\d]+)?
. The +
character means at least 1. It's not as greedy as .*
or .+
. I'm surrounding it with parenthesis and then adding a ?
character to make it not required. It works similar to your |
or statement logically because it will still find the year, but we're not making PHP do more work by parsing another check. Then we find the year (always 4 digits {4}
). Then we check to make sure it's followed by a literal )
character. The look behind (?<=\()
and the look ahead (?=\))
will find a match, but they are not included in the match results, leaving your answer clean.
Since preg_match()
returns an array()
we're catching the first element in the array. If you're looking for multiple matches in the same string you can use preg_match_all
.
$data = '<a href="not important">
<span class="theclass">data (not important)</span></a>
<span class="anotherclass">extra data (October 1, 2010)</span>
<span class="anotherclass">extra data (2011)</span>';
$pattern = '!(?<=\()([A-Za-z ,\d]+)?[\d]{4}(?=\))!';
$res = preg_match_all($pattern,$data,$myDate);
print_r($myDate[0]);
output
Array
(
[0] => October 1, 2010
[1] => 2011
)
If you're only looking for one match you would change the code to this:
$res = preg_match($pattern,$data,$myDate);
echo($myDate[0]);
Output
October 1, 2010
Another way to write the pattern would be like this... we've removed the parenthesis (grouping) and the plus +
modifier followed by the conditional ?
, but left the first set. Then we're using a *
to make it conditional. The difference is with preg_match and preg_match_all, any groupings are also stored in the array. Since this isn't a group, then it will not store extra array elements.
$pattern = '!(?<=\()[A-Za-z ,\d]*[\d]{4}(?=\))!';
Upvotes: 2