AndrewB
AndrewB

Reputation: 836

PHP: Preg_Match_All strange behaviour

I have a pattern being matched to a large string, using preg_match_all, it is getting the correct matches just fine however it then seems to go into each of those matches and try to find more matches, and then into each of those and stops when it's finally on an empty string. Sounds like some sort of recursion but I don't need and don't want this, is there a way to stop it?

Thank you for any help!

function getCategories($source)
{
    $categories = array();

    $pattern = "~<span class=.*\n<table class=.*\n<tr>\n<th.*\n<.th>\n<th.*\n<.th>\n<th.*\n<.th>\n<th.*\n<.th>\n<th.*\n<.th>\n<th.*\n<.th><.tr>\n(<tr id=.*\n(.*\n){6}<.td><.tr>(<.table>)?\n)*~";

    preg_match_all($pattern, $source, $categories);

    return $categories;
}

$categories = getCategories($source);

print_r($categories);

Upvotes: 0

Views: 81

Answers (1)

Karan Punamiya
Karan Punamiya

Reputation: 8863

The reason for this behavior is because there are multiple instances where you use .* terms.

The term can correspond to any length of string and can span across multiple tr tags in the example.

You need to use the non-greedy version .*?. that should do the trick.

Note: As suggested, the best approach for doing what you are attempting will be to parse the DOM structure as DOMElement or XML

Upvotes: 1

Related Questions