mulllhausen
mulllhausen

Reputation: 4435

php regex optionally match a whole word

im using php and i need to scrape some information from some curl responses to a site. i am simulating both an ajax request by a browser and a normal (entire) page request by a browser, however the ajax response is slightly different to the entire page request in this section of the html.

the ajax response is: <div id="accountProfile"><h2>THIS IS THE BIT I WANT</h2><dl id="accountProfileData">

however the normal response is: <div id="accountProfile"><html xmlns="http://www.w3.org/1999/xhtml"><h2>THIS IS THE BIT I WANT</h2><dl id="accountProfileData">

ie the ajax response is missing the tag: <html xmlns="http://www.w3.org/1999/xhtml">. i need to get the bits in between the h2 tags. obviously i can't just scrape the page for <h2>THIS IS THE BIT I WANT</h2><dl id="accountProfileData"> since these tags may occur in other places and not contain the information i want.

i can match either one of the patterns individually, however i would like to do both in a single regex. here is my solution for matching the ajax response:

<?php
$pattern = '/\<div id="accountProfile"\>\<h2\>(.+?)\<\/h2\>\<dl id="accountProfileData"\>/';
preg_match($pattern, $haystack, $matches);
print_r($matches);
?>

can someone show me how i should alter the pattern to optionally match the <html xmlns="http://www.w3.org/1999/xhtml"> tag aswell? if it helps to simplify the haystack for the purposes of brevity that's fine.

Upvotes: 4

Views: 379

Answers (1)

Dev.Jaap
Dev.Jaap

Reputation: 151

I haven't tested it, but you can try this:

    $pattern = '/\<div id="accountProfile"\>(\<html xmlns=\"http://www.w3.org/1999/xhtml\"\>){0,1}\<h2\>(.+?)\<\/h2\>\<dl id="accountProfileData"\>/';

Upvotes: 2

Related Questions