Eka
Eka

Reputation: 15002

How to validate a hyperlink from different links using php

can you please tell me how to validate a hyperlink from different hyperlinks. eg

i want to fetch these links separately starting with the bolded address(between two stars) from a website using simple html dom

1 http://**www.website1.com**/1/2/
2 http://**news.website2.com**/s/d
3 http://**website3.com/news**/gds

i know we can do it using preg_match ;but i am getting a hardtime understanding preg_match. can anyone give me a preg_match script for these websites validation.. and can you also explain me what this means

preg_match('|^http(s)?://[a-z0-9-]+(.[a-z0-9-]+)*(:[0-9]+)?(/.*)?$|i', $url)

what are those random looking characters in preg_match? what is the meaning of these characters?

Upvotes: 0

Views: 201

Answers (2)

rodneyrehm
rodneyrehm

Reputation: 13557

Have a look at In search of the perfect URL validation regex.

Upvotes: 0

Arkh
Arkh

Reputation: 8459

If you want to learn about regular expression, I think you could get a good start on the regular-expressions.info website.

And if you want to use them more, the book Mastering Regular Expressions is a must read.

Edit: here is a simple walkthrough tho:

  • the first parameter of preg_match is the regexp string. The second is the string you're testing against. A third optionnal one can be used and would be an array inside which everything captured is stored.
  • the | are used to delimit your regexp and its options. What is between the first one is the regexp, the i at the end is an option (meaning your regexp is case insensitive)
  • the first ^ is marking where your string you want to match starts
  • then (s)? mean that you want one or no s character, and you want to "capture it"
  • [a-z0-9]+ is any number (even 0) of alphanumeric characters
  • (.[a-z0-9-]+)* is wrong. It should be (\.[a-z0-9-]+)* to capture any number of sequences formed by a dot then at least one alphanumeric character
  • (:[0-9]+)? will capture one or no sequence formed by : followed by any number. It's used to get the url port
  • (/.*)? captures the end of the url, a slash followed by any number of any character
  • $ is the end of your string

Upvotes: 1

Related Questions