Dzhuneyt
Dzhuneyt

Reputation: 8701

Cannot Parse Multiline with Regular Expressions

<trans-unit id="8">
<source>Special settings for:</source>
<target>Special settings for:</target>
</trans-unit>

I am trying to get the ID and the contents of the target tag. The above structure is repeated many times in the XML I am trying to parse.

I am currently using the expression below, but it doesn't return anything but empty arrays:

preg_match_all('#<trans-unit id="(.*)">(.*)<target>(.*)</target>(.*)</trans-unit>#Ui', $xml, $matches);

Upvotes: 2

Views: 127

Answers (3)

strnk
strnk

Reputation: 2063

You have to specify the s option to your regular expression in PHP to add the multiline match capabilities (i.e. . will match newlines too).

edit: Changed the m option to s for further references, see the comment below.

Upvotes: 0

Dzhuneyt
Dzhuneyt

Reputation: 8701

You can use the /s pattern modifier to make the dot (.*) match all characters including newline characters. By default PCRE treats the string as a long one-line one.

http://php.net/manual/en/reference.pcre.pattern.modifiers.php

So in the above example:

preg_match_all('#<trans-unit id="(.*)">(.*)<target>(.*)</target>(.*)</trans-unit>#Uis', $xml, $matches);

Upvotes: 1

xdazz
xdazz

Reputation: 160883

Use an xml parser instead.

$xml = simplexml_load_string($string);
print_r($xml);

Upvotes: 4

Related Questions