mega-crazy
mega-crazy

Reputation: 858

preg_match within braces with optional existence additional content within braces sometimes

i have data like so

 $data =  '<a href="not important"><span class="theclass">data (not important)</span></a> <span class="anotherclass">extra data (October 1, 2010)</span>';

i want to get the date within the braces so ive done the following preg_match

preg_match("/\((([a-zA-Z]{5,10} .*?)|(\d{4}))\)/i",$data,$res);

please not that sometimes 'October 1' is not present BUT THE YEAR IS ALWAYS PRESENT hence the OR condition.... the thing is it gives me array of 3 in this case, i know its because of the set of 3 braces i have for each condition , is there any other better and cleaner way to achieve this ?

2nd condition method

   $data =  <a href="not important"><span class="theclass">data</span></a> <span class="theother">data <a href="not importand">data</a>  (2009)</span>
        </h3>

Thanks guys

Upvotes: 1

Views: 276

Answers (1)

AbsoluteƵER&#216;
AbsoluteƵER&#216;

Reputation: 7880

Use lookarounds

Here we're making sure there is a preceding ( character, then we look for text we would see in a date formatted like your example. This little bit of code says ALLOW for alpha numeric characters, a literal space character, and a comma, as well as digits ([A-Za-z ,\d]+)?. The + character means at least 1. It's not as greedy as .* or .+. I'm surrounding it with parenthesis and then adding a ? character to make it not required. It works similar to your | or statement logically because it will still find the year, but we're not making PHP do more work by parsing another check. Then we find the year (always 4 digits {4}). Then we check to make sure it's followed by a literal ) character. The look behind (?<=\() and the look ahead (?=\)) will find a match, but they are not included in the match results, leaving your answer clean.

Since preg_match() returns an array() we're catching the first element in the array. If you're looking for multiple matches in the same string you can use preg_match_all.

$data =  '<a href="not important">
   <span class="theclass">data (not important)</span></a>
   <span class="anotherclass">extra data (October 1, 2010)</span>
   <span class="anotherclass">extra data (2011)</span>';
$pattern = '!(?<=\()([A-Za-z ,\d]+)?[\d]{4}(?=\))!';
$res = preg_match_all($pattern,$data,$myDate);

print_r($myDate[0]);

output

Array
(
    [0] => October 1, 2010
    [1] => 2011
)

If you're only looking for one match you would change the code to this:

$res = preg_match($pattern,$data,$myDate);

echo($myDate[0]);

Output

October 1, 2010

Another way to write the pattern would be like this... we've removed the parenthesis (grouping) and the plus + modifier followed by the conditional ?, but left the first set. Then we're using a * to make it conditional. The difference is with preg_match and preg_match_all, any groupings are also stored in the array. Since this isn't a group, then it will not store extra array elements.

$pattern = '!(?<=\()[A-Za-z ,\d]*[\d]{4}(?=\))!';

Upvotes: 2

Related Questions