Knows Not Much
Knows Not Much

Reputation: 31546

What's wrong with this Regular Expression for URL

I am writing a regular expression which should match URL of this type

http(s)://a.b.c.domain.company.com(:8000)

protocol can be http and https and port is optional

I have written this

$reg = "^(http|https)(\://)([a-zA-Z0-9\-\.]){6,}(\:[0-9]*)?\/?"
$url1 = "http://uat.upm.goal.services.ps.com"
$url2 = "http://uat.upm.goal.services.ps.com:9000/"
$url3 = "http://uat.upm.goal.services.ps.com:9000?name=foo"
$flag1 = $url1 -Match $reg
$flag2 = $url2 -Match $reg
$flag3 = $url3 -Match $reg
echo $flag1
echo $flag2
echo $flag3   

I desire that $url1 and $url2 match the regex... but $url3 should fail the match (becuase it comtains commands). I want the URL to end at either .com OR .com:8000 OR .com:8000/

I don't want anything after the (optional) port and /.

Upvotes: 0

Views: 64

Answers (2)

klarki
klarki

Reputation: 915

try "^(http|https)(\://)([a-zA-Z0-9\-\.]){6,}(\:[0-9]*)?\/?"

for urls without query part use this:
"^(http|https)(\://)([a-zA-Z0-9\-\.]){6,}(\:[0-9]*)?\/?$"

$ means the end of line / string

I removed the ^ at the end since it's a special char meaning the beginning of the line

I changed {6} to {6,} which mean that there has to be at least 6 chars from the group

I tested this in awk and it matches:

awk='/^(http|https)(\:\/\/)([a-zA-Z0-9\-\.]){6,}(\:[0-9]*)?\/?$/'
echo "http://u.ucm.project.services.ps.com" | awk "$awk {print\$0}"
echo "https://z.ucm.project.services.ps.com:22400/" | awk "$awk {print\$0}"
echo "http://uat.upm.goal.services.ps.com:9000?name=foo" | awk "$awk {print\$0}"

as you wanted, only the first two match.

Upvotes: 1

sinelaw
sinelaw

Reputation: 16553

You are missing + after the letter groups. So ([a-zA-Z0-9\-\.]){6} should probably be ([a-zA-Z0-9\-\.]+){6}, so that at least one character and possibly more are there.

Also, the {6} doesn't do what you expect (match domain with 6 dots) because of the way you wrote it. Either remove it, and allow any number of dot-seperated domain parts or change it to something like:

([a-zA-Z0-9\-]+\.){6}

Upvotes: 1

Related Questions