Reputation: 1851
So I'm working on a project where I am building a price guide for a trading card game. Forgive the nerdiness level here. I'm pulling data from one website
$data = mb_convert_encoding(file_get_contents("http://yugioh.wikia.com/api.php?action=query&prop=revisions&titles=Elemental%20HERO%20Shining%20Flare%20Wingman&rvprop=content&format=php"), "HTML-ENTITIES", "UTF-8");
then I am parsing it using a series of Regex statements.
preg_match_all('/(?<=\|\slore)\s+\=(.*)/', $data, $matches);
$text = $matches[1][0]; //it prints out here just fine
$text = preg_replace("/(\[\[(\w+|\s)*\|)/sx", "" , $text); //it disappears if I try to print it here
$text = preg_replace("/\[\[/", "" , $text);
$text = preg_replace("/\]\]/", "" , $text);
As you can see by the lines above at the second line where I grab the matches, if I follow it with a print_r statement it will print the text. On the next line if I follow it with a print statement it will not print anything. So by this logic it means the regex isn't parsing correctly. So what am I doing wrong with it? I think it has something to do with multiline but I tried that and it didn't help.
EDIT
Here is the text after the first pull
"[[Elemental HERO Flame Wingman]]" + "[[Elemental HERO Sparkman]]"
Must be [[Fusion Summon]]ed and cannot be [[Special Summon]]ed by other ways. This card gains 300 [[ATK]] for each "[[Elemental HERO]]" card in your [[Graveyard]]. When this card [[destroy]]s a [[Monster Card|monster]] [[Destroyed by Battle|by battle]] and [[send]]s it to the Graveyard: Inflict [[Effect Damage|damage]] to your opponent equal to the ATK of the destroyed monster in the Graveyard.
Upvotes: 3
Views: 51
Reputation: 627607
This regex /(\[\[(\w+|\s)*\|)/sx
contains nested quantifiers: \w
is used with +
quantifier and a *
is applied to the whole alternation group. That creates a huge amount of backtracking steps, and results in catastrophic backtracking.
The best way to avoid that issue here is through character class [\w\s]*
(that matches 0 or more alphanumeric characters or whitespace symbols).
See IDEONE demo:
$s = "\"[[Elemental HERO Flame Wingman]]\" + \"[[Elemental HERO Sparkman]]\"\nMust be [[Fusion Summon]]ed and cannot be [[Special Summon]]ed by other ways. This card gains 300 [[ATK]] for each \"[[Elemental HERO]]\" card in your [[Graveyard]]. When this card [[destroy]]s a [[Monster Card|monster]] [[Destroyed by Battle|by battle]] and [[send]]s it to the Graveyard: Inflict [[Effect Damage|damage]] to your opponent equal to the ATK of the destroyed monster in the Graveyard.";
$s = preg_replace('/(\[\[([\w\s]*)\|)/', "" , $s);
echo $s;
Also note that you do not need x
modifier (since there are no comments and meaningless whitespace in the pattern itself) and the s
modifier (since there is no .
in the pattern).
Upvotes: 2