codef0rmer
codef0rmer

Reputation: 10530

How to capture the part of URL after .com in regex?

I've come up with the following regular expression to match the valid url.

^(?:(ftp|http|https):\/\/)?(?:[a-zA-Z]+\.){0,1}(?:[a-zA-Z0-9][a-zA-Z0-9-]+){1}(?:\.[a-zA-Z]{2,6})?$

Which matches (ftp|http|https) optional, domain name without .com or anything but also I want to capture everything that come after .com.

The above regex validates http://stackoverflow.com or localhost or google.com but not http://stackoverflow.com/questions/ask

Upvotes: 1

Views: 801

Answers (1)

anubhava
anubhava

Reputation: 785286

To match remaining URI you can add \S* in the end:

^(?:(ftp|http|https):\/\/)?(?:[a-zA-Z]+\.){0,1}(?:[a-zA-Z0-9][a-zA-Z0-9-]+){1}(?:\.[a-zA-Z]{2,6})?(\/|\/\w\S*)?$

But to parse various components of URL it is much better to use built-in parse_url function.

Alternate to validate local urls:

^(?:(ftp|http|https):\/\/)?(?:[a-zA-Z0-9.]+\.){0,1}(?:[a-zA-Z0-9][a-zA-Z0-9-]+){1}(?:\.[a-zA-Z]{2,6})?(\/|\/[\w#!:.?+=&%@!\-\/]*)?$

eg.: 172.18.11.178

Demo: http://regex101.com/r/vV0sB5

Upvotes: 1

Related Questions