shaimagz
shaimagz

Reputation: 1265

Convert Regular Expression pattern from Javascript to PCRE (perl)

This is my javascript regex pattern:

    url = "http://www.amazon.com/gp";    
    hostname = /^((\\w+):\\/\\/\\/?)?((\\w+):?(\\w+)?@)?([^\\/\\?:]+):?(\\d+)?(\\/?[^\\?#;\\|]+)?([;\\|])?([^\\?#]+)?\\??([^#]+)?#?(\\w*)/.exec(url) || [];
// would return "www.amazon.com"

what are the additional changes I need to do to make it work in pcre code instead of javascript? or maybe it isn't possible and I need to build entirely new pattern to make it work in pcre?

this is a simple version of my code:

int main(void)
{
    string text = "http://www.amazon.com";
    string hostname;
    pcrecpp::RE re("^((\\w+):\\/\\/\\/?)?((\\w+):?(\\w+)?@)?([^\\/\\?:]+):?(\\d+)?(\\/?[^\\?#;\\|]+)?([;\\|])?([^\\?#]+)?\\??([^#]+)?#?(\\w*)");
    if(re.PartialMatch(text, &hostname)) 
    {
        std::cout << "match: " << hostname << "\n";
    }else{
        std::cout << "no match. \n";
    }       
    return 0;
}

Thanks.

Upvotes: 1

Views: 2151

Answers (2)

Wolph
Wolph

Reputation: 80031

There's no need to convert it, the only thing you have to take care of is the escaping and the / delimiter.

Do note that a regular expression might not be what you want to use here. Or atleast... not like this directly. There are lots of url parsing libraries that are a lot better suited for this task. HTParse for example.

Your C++ code should work but your regex has a lot of optional groups so it's hard to be sure in what group the hostname will end up.

As hacky as it may be, my edit works for this input

string text = "http://www.amazon.com";
string tmp;
string hostname;
pcrecpp::RE re("^((\\w+):\\/\\/\\/?)?((\\w+):?(\\w+)?@)?([^\\/\\?:]+):?(\\d+)?(\\/?[^\\?#;\\|]+)?([;\\|])?([^\\?#]+)?\\??([^#]+)?#?(\\w*)");
if(re.PartialMatch(text, &tmp, &tmp, &tmp, &tmp, &tmp, &hostname))
{
    std::cout << "match: " << hostname << "\n";
}else{
    std::cout << "no match. \n";
}

Upvotes: 3

HaxElit
HaxElit

Reputation: 4073

"^((\\w+):\\/\\/\\/?)?((\\w+):?(\\w+)?@)?([^\\/\\?:]+):?(\\d+)?(\\/?[^\\?#;\\|]+)?([;\\|])?([^\\?#]+)?\\??([^#]+)?#?(\\w*)"

Upvotes: 1

Related Questions