Reputation: 681

Regular expression to find <character> tags

Help with regular expressions needed. I'm trying using regular expressions and preg_match_all find blocks <character>...</character>. Here is how my data looks like:

<character>
杜塞尔多夫
杜塞爾多夫
    <div class="hp">dùsàiěrduōfū<div class="hp">dkfjdkfj</div></div>
    <div class="tr"><span class="green"><i>г.</i></span> Duesseldorf (<i>Deutschland</i>)</div>
    <div class="tr"></div>
</character>

<character>
    我, 是谁
    <div class="hp">текст</div>
    <div class="tr">some text in different languages</div>
</character>

I tried \<character\>.*\<\/character> but unfortunately it didn't work. Any suggestions?

Upvotes: 1

Answers (4)

seanmonstar

Reputation: 11444

Unless you're required at gunpoint to use regular expressions to do this, DOMDocument will be far more accurate.

<?php

$dom = new DOMDocument;
$dom->loadXML($data);

$character_nodes = $dom->getElementsByTagName('character');

// use $character_nodes...
?>

Upvotes: 5

Richard Simões

Reputation: 12791

If using the preg family of functions, your regular expression should be:

/\<character>(.*?)\<\/character>/s

The non-greedy operator ? will prevent you from only getting one match starting from the first <character> and ending at the last </character>.The /s flag will allow your dot to match line breaks.

Upvotes: 3