MSH
MSH

Reputation: 429

Conversion between utf-8 and unicode in web servers

I really want to know how web servers convert URL UTF-8 encoded characters to unicode.How do they solve problems such as duplicate URL encoding and non-shortest form utf-8 codes conversion such that explained here.

for example: http://www.example.com/dir1/index.html?name=%D8%A7%D9%84%D8%A7%D8%B3%D9%85%D8%A7

to http://www.example.com/dir1/index.html?name=الاسما

I wrote a c++ program that does this conversion but in general I want to know how web servers like apache or nginx do this.

Upvotes: 1

Views: 200

Answers (1)

technusm1
technusm1

Reputation: 533

You meant doing something like this:

From - Encode/Decode URLs in C++

#include <string>
#include <iostream>

using std::string;
using std::cout;
using std::cin;

string urlDecode(string &SRC) {
    string ret;
    char ch;
    int i, ii;
    for (i=0; i<SRC.length(); i++) {
        if (int(SRC[i])=='%') {
            sscanf(SRC.substr(i+1,2).c_str(), "%x", &ii);
            ch=static_cast<char>(ii);
            ret+=ch;
            i=i+2;
        } else {
            ret+=SRC[i];
        }
    }
    return (ret);
}

int main()
{
    string s = "http://www.example.com/dir1/index.html?name=%D8%A7%D9%84%D8%A7%D8%B3%D9%85%D8%A7";
    cout << urlDecode(s);
}

Upvotes: 1

Related Questions