csha
csha

Reputation: 9564

PHP preg_replace();

I've got a problem with regexp function, preg_replace(), in PHP. I want to get viewstate from html's input, but it doesn't work properly.

This code:

$viewstate = preg_replace('/^(.*)(<input\s+id="__VIEWSTATE"\s+type="hidden"\s+value=")(.*[^"])("\s+name="__VIEWSTATE">)(.*)$/u','^\${3}$',$html);

Returns this:

%0D%0A%0D%0A%3C%21DOCTYPE+html+PUBLIC+%22-%2F%2FW3C%2F%2FDTD+XHTML+1.0+Transitional%2F%2FEN%22+%22http%3A%2F%2Fwww.w3.org%2FTR%2Fxhtml1%2FDTD%2Fxhtml1-transitional.dtd%22%3E%0D%0A%0D%0A%3Chtml+xmlns%3D%22http%3A%2F%2Fwww.w3.org%2F1999%2Fxhtml%22+%3E%0D%0A%3Chead%3E%3Ctitle%3E%0D%0A%09Strava.cz%0D%0A%3C%2Ftitle%3E%3Clink+rel%3D%22shortcut+icon%22+href%3D%22..%2FGrafika%2Ffavicon.ico%22+type%3D%22image%2Fx-icon%22+%2F%3E%3Clink+rel%3D%22stylesheet%22+type%3D%22text%2Fcss%22+media%3D%22screen%22+href%3D%22..%2FStyly%2FZaklad.css%22+%2F%3E%0D%0A++++%3Cstyle+type%3D%22text%2Fcss%22%3E%0D%0A++++++++.style1%0D%0A++++++++%7B%0D%0A++++++++++++width%3A+47px%3B%0D%0A++++++++%7D%0D%0A++++++++.style2%0D%0A++++++++%7B%0D%0A++++++++++++width%3A+64px%3B%0D%0A++++++++%7D%0D%0A++++%3C%2Fstyle%3E%0D%0A%0D%0A%3Cscript+type%3D%22text%2Fjavascript%22%3E%0D%0A%0D%0A++var+_gaq+%3D+_gaq+%7C%7C+%5B%5D%3B%0D%0A++_gaq.push%28%5B

EDIT: Sorry, I left this question for a long time. Finally I used DOMDocument.

Upvotes: 0

Views: 474

Answers (3)

csha
csha

Reputation: 9564

The main mistake was the use of funciton preg_replace, witch returns the subject - neither the matched pattern nor the replacement. Thank you for your ideas and for the recommendation of DOMDocument. m93a

http://www.php.net/manual/en/function.preg-replace.php#refsect1-function.preg-replace-returnvalues

Upvotes: 0

poncha
poncha

Reputation: 7866

To be sure i'd split this match into two phases:

  1. Find the relevant input element
  2. Get the value

Because you cannot be certain what the attributes order in the element will be.

if(preg_match('/<input[^>]+name="__VIEWSTATE"[^>]*>/i', $input, $match))
    $value = preg_replace('/.*value="([^"]*)".*/i', '$1', $match[0]);

And, of course, always consider DOM and DOMXpath over regex for parsing html/xml.

Upvotes: 2

Tobias Sj&#246;sten
Tobias Sj&#246;sten

Reputation: 1094

You should only capture when you're planning on using the data. So most () are obsolete in that regexp pattern. Not a cause for failure but I thought I'd mention it.

Instead of using [^"] to mark that you don't want that character you could use the non-greedy modifier - ?. This makes sure the pattern is matching as little as it can. Since you have name="__VIEWSTATE" following the value this should be safe.

Let's put this in practice and simplify the pattern some. This works as you want:

'/.*<input\s+id="__VIEWSTATE"\s+type="hidden"\s+value="(.+?)"\s+name="__VIEWSTATE">.*/'

I would strongly recommend checking out an alternative to regexp for DOM operations. This makes certain your code works also if the attributes changes order. Plus it's so much nicer to work with.

Upvotes: 1

Related Questions