rishijd
rishijd

Reputation: 1344

Decode a byte encoded string via my URL

We have a PHP site on Zend Framework with a backend Postgresql database. Our primary character encoding is UTF-8.

I just checked our error log and found a strange entry. My URL is as follows: www.mydomain.com/schuhe-für-breite-füsse

however someone (or maybe a bot) has tried to access this URL as follows: www.mydomain.com/schuhe-f\xc3\xbcr-breite-f\xc3\xbcsse/

It's the first time I've seen something like the above. Two things are happening on my page: 1) The above URL is queried against our CMS. This works fine for some reason, I think Postgresql reaslises it is byte-encoded and then converts it back when tried to find this SEF URL in our database.

2) An Ajax request is made on the page, passing the same SEF URL. This fails. I believe the slashes are causing a problem on Javascript.

To avoid this I want to decode any URL that is encoded like this. However a quick test of the following code did not decode anything for me :(

$landing_sef_url = $this->_getParam('landing_sef_url');
$utf8=html_entity_decode($landing_sef_url);
$iso8859=utf8_decode($utf8);
$test3 = html_entity_decode($landing_sef_url, 1, "ISO-8859-1");
$test4 = urldecode($landing_sef_url);

echo utf8_decode("$landing_sef_url");
echo "<br/><br/>";
die($landing_sef_url . " -- $utf8 -- $iso8859 <br/>$test3<br/>$test4");

I found the above via various posts online but they all print back the same result - schuhe-f\xc3\xbcr-breite-f\xc3\xbcsse

Any help would be MUCH appreciated. Many thanks!

Upvotes: 1

Views: 238

Answers (1)

Evert
Evert

Reputation: 99841

This method seems to do what you're looking for:

http://li.php.net/manual/en/function.stripcslashes.php

But if you're just looking to unescape \x## sequences, you could also do this with a fairly simple regular expression.

Upvotes: 1

Related Questions