user993683
user993683

Reputation:

Using spéciâl characters in URLs

I'm developing a website which lets people create their own translator. They can choose the name of the URL, and it is sent to a database and I use .htaccess to redirect website.com/nameoftheirtranslator

to:

website.com/translator.php?name=nameoftheirtranslator

Here's my problem:

Recently, I've noticed that someone has created a translator with special characters in the name -> "LAEFÊVËŠI".

But when it is processed (posted to a php file, then mysqli_real_escape_string) and added to the database it appears as simply "LAEFVI" - so you can see the special characters have been lost somewhere.

I'm not quite sure what to do here, but I think there are two paths:

  1. Try to keep the characters and do some encoding (no idea where to start)
  2. Ditch them and tell users to only use 'normal' characters in the names of their translators (not ideal)

I'm wondering whether it's even possible to have a url like website.com/LAEFÊVËŠI - can that be interpreted by the server?

EDIT1: I notice that stack overflow, on this very question, translates the special characters in my title to .../using-special-characters-in-urls! This seems like a great solution, I guess I could make a function that translates special characters like â to their normal equivalent (like â)? And I suppose I would just ignore other characters like /#@"',&? Now that I think of it, there must be some fairly standard/good-practice strategies for getting around problems like this.

EDIT2: Actually, now that I think about it (more) - I really want this thing to be usable by people of any language (not just English), so I would really love to be able to have special characters in the urls. Having said this, I've just found that Google doesn't interpret â as a, so people may have a hard time finding the LAEFÊVËŠI translator if I don't translate the letters to normal characters. Ahh!

Upvotes: 4

Views: 122

Answers (1)

user993683
user993683

Reputation:

Okay, after that crazy episode, here's what happened:

  • Found out that I was removing all the non alpha-numeric characters with PHP preg_replace().
  • Altered preg_replace so it only removes spaces and used rawurlencode():

$name = mysqli_real_escape_string($con, rawurlencode( preg_replace("/\s/", '', $name) ));

  • Now everything is in the database encoded, safe and sound.
  • Used this rewrite rule RewriteRule ^([^/.]+)$ process.php?name=$1 [B]
  • Run around in circles for 2 hours thingking my rewrite was wrong because I was getting "page not found"
  • Realise that process.php didn't have a rawurlencode() to read in the name

$name = rawurlencode($_GET['name']);

Now it works.

WOO!

Sleep time.

Upvotes: 1

Related Questions