Reputation: 1833
I have this regex line but it's not working perhaps due to newlines? My goal is to extract the passengers name and phone number.
Here is a snippet of the data i have... it's in a loop of 100 of the below:
<div class="booking-section">
<h4>Passenger Details</h4>
<p>
<b>Passenger Name:</b><br />
Ms Wendy Walker-hunter
</p>
<p>
<b>Mobile Number:</b><br />
161525961468
</p>
I'm currently just trying to get passengers name first...
$re = '/(?<=Name)(.*)(?=Mobile)/s';
preg_match($re, $str, $matches);
// Print the entire match result
print_r($matches);
Any kind of help I can get on this is greatly appreciated!
Upvotes: 0
Views: 40
Reputation: 42681
Never parse HTML with a regular expression. Here's how you should be doing this sort of thing:
$html = '<div class="booking-section">
<h4>Passenger Details</h4>
<p>
<b>Passenger Name:</b><br />
Ms Wendy Walker-hunter
</p>
<p>
<b>Mobile Number:</b><br />
161525961468
</p>
</div>
<div class="booking-section">
<h4>Passenger Details</h4>
<p>
<b>Passenger Name:</b><br />
Mr John Walker
</p>
<p>
<b>Mobile Number:</b><br />
16153682486
</p>
</div>
';
libxml_use_internal_errors(true);
$dom = new DomDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$results = $xpath->query("//div[@class='booking-section']/p[1]/text()[normalize-space()]");
foreach ($results as $node) {
echo trim($node->textContent) . "\n";
}
This uses an XPath query to get the nodes you're looking for:
//div[@class='booking-section']/p[1]/text()[normalize-space()]
This tells it to select bare text nodes from the first <p>
element inside a <div>
with class
attribute of "booking-section."
According to the documentation:
this function may generate
E_WARNING
errors when it encounters bad markup. libxml's error handling functions may be used to handle these errors.
I've enabled libxml's internal error handling for this example, to suppress any warnings about the HTML, though of course you should not be outputting warnings to users anyway.
Upvotes: 1
Reputation: 39
This should work if snippets are always formatted as the example, it relies on the new lines:
$t = '
<div class="booking-section">
<h4>Passenger Details</h4>
<p>
<b>Passenger Name:</b><br />
Ms Wendy Walker-hunter
</p>
<p>
<b>Mobile Number:</b><br />
161525961468
</p>
</div>';
preg_match('/Passenger Name:[^\r?\n]+\r?\n([^\r?\n]+)\r?\n/', $t, $name);
preg_match('/Mobile Number:[^\r?\n]+\r?\n([^\r?\n]+)\r?\n/', $t, $phone);
echo trim($name[1]), ' / ', trim($phone[1]);
Outpus is: Ms Wendy Walker-hunter / 161525961468
Same with preg_match_all:
$t = '
<div class="booking-section">
<h4>Passenger Details</h4>
<p>
<b>Passenger Name:</b><br />
Ms Wendy Walker-hunter
</p>
<p>
<b>Mobile Number:</b><br />
161525961468
</p>
</div>
<div class="booking-section">
<h4>Passenger Details</h4>
<p>
<b>Passenger Name:</b><br />
Ms Wendy Walker-hunter 2
</p>
<p>
<b>Mobile Number:</b><br />
161525961468 2
</p>
</div>
<div class="booking-section">
<h4>Passenger Details</h4>
<p>
<b>Passenger Name:</b><br />
Ms Wendy Walker-hunter 3
</p>
<p>
<b>Mobile Number:</b><br />
161525961468 3
</p>
</div>';
preg_match_all('/Passenger Name:[^\r?\n]+\r?\n([^\r?\n]+)\r?\n/', $t, $name);
preg_match_all('/Mobile Number:[^\r?\n]+\r?\n([^\r?\n]+)\r?\n/', $t, $phone);
echo '<pre>';
print_r($name);
print_r($phone);
die;
Output is something like
Array
(
[1] => Array
(
[0] => Ms Wendy Walker-hunter
[1] => Ms Wendy Walker-hunter 2
[2] => Ms Wendy Walker-hunter 3
)
)
Array
(
[1] => Array
(
[0] => 161525961468
[1] => 161525961468 2
[2] => 161525961468 3
)
)
Upvotes: 0