Reputation: 2780
I have a question about a regular function that is giving me grief. I have a list of items that is separated in tags. I am trying to extract everything between two particular tags (which occur multiple times). Here is a sample of the list I am parsing:
<ResumeResultItem_V3>
<ResumeTitle>Johnson</ResumeTitle>
<RecentEmployer>University of Phoenix</RecentEmployer>
<RecentJobTitle>Advisor</RecentJobTitle>
<RecentPay>40000</RecentPay>
</ResumeResultItem_V3>
<ResumeResultItem_V3>
<ResumeTitle>ResumeforJake</ResumeTitle>
<RecentEmployer>APEX</RecentEmployer>
<RecentJobTitle>Consultant</RecentJobTitle>
<RecentPay>66000</RecentPay>
</ResumeResultItem_V3>
I'm trying to get everything in between "ResumeResultItem_V3" as a blob of text, but I can't seem to get the expression right.
Here is the code I have so far:
$test = "(<ResumeResultItem_V3>)";
$test2 = "(<\/ResumeResultItem_V3>)";
preg_match_all("/" . $test . "(\w+)" . $test2 . "/", $xml, $matches);
foreach ($matches[0] as $match) {
echo $match;
echo "<br /><br />";
}
How can I fix this?
Upvotes: 1
Views: 196
Reputation:
If you can use the output as an array with 1 item for each of the "text blob" matches, try this:
<?php
$text =
"<ResumeResultItem_V3>
<ResumeTitle>Johnson</ResumeTitle>
<RecentEmployer>University of Phoenix</RecentEmployer>
<RecentJobTitle>Advisor</RecentJobTitle>
<RecentPay>40000</RecentPay>
</ResumeResultItem_V3>
<ResumeResultItem_V3>
<ResumeTitle>ResumeforJake</ResumeTitle>
<RecentEmployer>APEX</RecentEmployer>
<RecentJobTitle>Consultant</RecentJobTitle>
<RecentPay>66000</RecentPay>
</ResumeResultItem_V3>";
$matches = preg_split("/<\/ResumeResultItem_V3>/",preg_replace("/<ResumeResultItem_V3>/","",$text));
print_r($matches);
?>
Results in:
Array
(
[0] =>
<ResumeTitle>Johnson</ResumeTitle>
<RecentEmployer>University of Phoenix</RecentEmployer>
<RecentJobTitle>Advisor</RecentJobTitle>
<RecentPay>40000</RecentPay>
[1] =>
<ResumeTitle>ResumeforJake</ResumeTitle>
<RecentEmployer>APEX</RecentEmployer>
<RecentJobTitle>Consultant</RecentJobTitle>
<RecentPay>66000</RecentPay>
[2] =>
)
Upvotes: 1
Reputation: 9196
I'm making assuptions about your XML structure, but I really think you need an example using an XML parser, like SimpleXML.
$xml = new SimpleXMLElement( $file );
foreach( $xml->ResumeResultItem_V3 as $ResumeResultItem_V3 )
echo (string)$ResumeResultItem_V3;
Upvotes: 2
Reputation: 145482
You are probably better off with simplexml
for extracting the data here.
But to also answer the regex question. \w+
only matches word-characters. But in this case you want it to match pretty much everything in between the delimeters, which .*?
can be used for.
preg_match_all("/$test(.*?)$test2/s", $xml, $matches);
Only works with the /s
modifier though.
Upvotes: 1
Reputation: 77034
Ignoring that you probably ought to use an XML parser, and that PHP has one you can use...
The issue is that \w+
matches word characters, not any character. A space and most punctuation aren't word characters, so your match fails. You need instead to match "any" character .
for as many as there are +
, but because you might be able to group excessively, you need a modifier to make it non-greedy, ?
. Your expression should work if you change \w+
to .+?
-- the any character match also requires an s
modifier, so:
preg_match_all('/' . $test . '(.+?)' . $test2 . '/s', $xml, $matches);
Upvotes: 1