Shattuck
Shattuck

Reputation: 2780

PHP preg_match_all question

I have a question about a regular function that is giving me grief. I have a list of items that is separated in tags. I am trying to extract everything between two particular tags (which occur multiple times). Here is a sample of the list I am parsing:


<ResumeResultItem_V3>
    <ResumeTitle>Johnson</ResumeTitle>
    <RecentEmployer>University of Phoenix</RecentEmployer>
    <RecentJobTitle>Advisor</RecentJobTitle>
    <RecentPay>40000</RecentPay>
</ResumeResultItem_V3>

<ResumeResultItem_V3>
    <ResumeTitle>ResumeforJake</ResumeTitle>
    <RecentEmployer>APEX</RecentEmployer>
    <RecentJobTitle>Consultant</RecentJobTitle>
    <RecentPay>66000</RecentPay>
</ResumeResultItem_V3>


I'm trying to get everything in between "ResumeResultItem_V3" as a blob of text, but I can't seem to get the expression right.

Here is the code I have so far:




$test = "(<ResumeResultItem_V3>)";
$test2 = "(<\/ResumeResultItem_V3>)";

preg_match_all("/" . $test . "(\w+)" . $test2 . "/", $xml, $matches);

foreach ($matches[0] as $match) {
       echo $match;
       echo "<br /><br />";
}

How can I fix this?

Upvotes: 1

Views: 196

Answers (4)

Corey
Corey

Reputation:

If you can use the output as an array with 1 item for each of the "text blob" matches, try this:

<?php
$text =
"<ResumeResultItem_V3>
    <ResumeTitle>Johnson</ResumeTitle>
    <RecentEmployer>University of Phoenix</RecentEmployer>
    <RecentJobTitle>Advisor</RecentJobTitle>
    <RecentPay>40000</RecentPay>
</ResumeResultItem_V3>

<ResumeResultItem_V3>
    <ResumeTitle>ResumeforJake</ResumeTitle>
    <RecentEmployer>APEX</RecentEmployer>
    <RecentJobTitle>Consultant</RecentJobTitle>
    <RecentPay>66000</RecentPay>
</ResumeResultItem_V3>";

$matches = preg_split("/<\/ResumeResultItem_V3>/",preg_replace("/<ResumeResultItem_V3>/","",$text));
print_r($matches);
?>

Results in:

Array
(
    [0] => 
    <ResumeTitle>Johnson</ResumeTitle>
    <RecentEmployer>University of Phoenix</RecentEmployer>
    <RecentJobTitle>Advisor</RecentJobTitle>
    <RecentPay>40000</RecentPay>

    [1] => 


    <ResumeTitle>ResumeforJake</ResumeTitle>
    <RecentEmployer>APEX</RecentEmployer>
    <RecentJobTitle>Consultant</RecentJobTitle>
    <RecentPay>66000</RecentPay>

    [2] => 
)

Upvotes: 1

Kevin Peno
Kevin Peno

Reputation: 9196

I'm making assuptions about your XML structure, but I really think you need an example using an XML parser, like SimpleXML.

$xml = new SimpleXMLElement( $file );
foreach( $xml->ResumeResultItem_V3 as $ResumeResultItem_V3 )
    echo (string)$ResumeResultItem_V3;

Upvotes: 2

mario
mario

Reputation: 145482

You are probably better off with simplexml for extracting the data here.

But to also answer the regex question. \w+ only matches word-characters. But in this case you want it to match pretty much everything in between the delimeters, which .*? can be used for.

preg_match_all("/$test(.*?)$test2/s", $xml, $matches);

Only works with the /s modifier though.

Upvotes: 1

Mark Elliot
Mark Elliot

Reputation: 77034

Ignoring that you probably ought to use an XML parser, and that PHP has one you can use...

The issue is that \w+ matches word characters, not any character. A space and most punctuation aren't word characters, so your match fails. You need instead to match "any" character . for as many as there are +, but because you might be able to group excessively, you need a modifier to make it non-greedy, ?. Your expression should work if you change \w+ to .+? -- the any character match also requires an s modifier, so:

preg_match_all('/' . $test . '(.+?)' . $test2 . '/s', $xml, $matches);

Upvotes: 1

Related Questions