Anthony
Anthony

Reputation: 681

Regular expression to find <character> tags

Help with regular expressions needed. I'm trying using regular expressions and preg_match_all find blocks <character>...</character>. Here is how my data looks like:

<character>
杜塞尔多夫
杜塞爾多夫
    <div class="hp">dùsàiěrduōfū<div class="hp">dkfjdkfj</div></div>
    <div class="tr"><span class="green"><i>г.</i></span> Duesseldorf (<i>Deutschland</i>)</div>
    <div class="tr"></div>
</character>

<character>
    我, 是谁
    <div class="hp">текст</div>
    <div class="tr">some text in different languages</div>
</character>

I tried \<character\>.*\<\/character> but unfortunately it didn't work. Any suggestions?

Upvotes: 1

Views: 182

Answers (4)

seanmonstar
seanmonstar

Reputation: 11444

Unless you're required at gunpoint to use regular expressions to do this, DOMDocument will be far more accurate.

<?php

$dom = new DOMDocument;
$dom->loadXML($data);

$character_nodes = $dom->getElementsByTagName('character');

// use $character_nodes...
?>

Upvotes: 5

Richard Sim&#245;es
Richard Sim&#245;es

Reputation: 12791

If using the preg family of functions, your regular expression should be:

/\<character>(.*?)\<\/character>/s

The non-greedy operator ? will prevent you from only getting one match starting from the first <character> and ending at the last </character>.The /s flag will allow your dot to match line breaks.

Upvotes: 3

Jonas
Jonas

Reputation: 1563

Try

<character>(.*?)<\/character>

The question mark is an ungreedy qualifier, meaning it'll match a string as short as possible. Also < and > doesn't need escaping.

Upvotes: 2

Tim Sylvester
Tim Sylvester

Reputation: 23128

You may need to use the "/u" option to correctly process UTF8 text.

http://php.net/manual/en/reference.pcre.pattern.modifiers.php

Upvotes: 0

Related Questions