Reputation: 681
Help with regular expressions needed. I'm trying using regular expressions and preg_match_all
find blocks <character>...</character>
. Here is how my data looks like:
<character>
杜塞尔多夫
杜塞爾多夫
<div class="hp">dùsàiěrduōfū<div class="hp">dkfjdkfj</div></div>
<div class="tr"><span class="green"><i>г.</i></span> Duesseldorf (<i>Deutschland</i>)</div>
<div class="tr"></div>
</character>
<character>
我, 是谁
<div class="hp">текст</div>
<div class="tr">some text in different languages</div>
</character>
I tried \<character\>.*\<\/character>
but unfortunately it didn't work. Any suggestions?
Upvotes: 1
Views: 182
Reputation: 11444
Unless you're required at gunpoint to use regular expressions to do this, DOMDocument will be far more accurate.
<?php
$dom = new DOMDocument;
$dom->loadXML($data);
$character_nodes = $dom->getElementsByTagName('character');
// use $character_nodes...
?>
Upvotes: 5
Reputation: 12791
If using the preg
family of functions, your regular expression should be:
/\<character>(.*?)\<\/character>/s
The non-greedy operator ?
will prevent you from only getting one match starting from the first <character>
and ending at the last </character>
.The /s
flag will allow your dot to match line breaks.
Upvotes: 3
Reputation: 1563
Try
<character>(.*?)<\/character>
The question mark is an ungreedy qualifier, meaning it'll match a string as short as possible. Also < and > doesn't need escaping.
Upvotes: 2
Reputation: 23128
You may need to use the "/u" option to correctly process UTF8 text.
http://php.net/manual/en/reference.pcre.pattern.modifiers.php
Upvotes: 0