user2058002
user2058002

Reputation:

match first character in a regex?

I have the following regex:

http://([^:]*):?([0-9]*)(/.*)

When I match that against http://brandonhsiao.com/essays/showers.html, the parentheses grab: http://brandonhsiao.com/essays and /showers.html. How can I get it to grab http://brandonhsiao.com and /essays/showers.html?

Upvotes: 2

Views: 146

Answers (3)

Joseph Myers
Joseph Myers

Reputation: 6552

Put a question mark after the first * you have to make it non-greedy. Right now your code for matching the hostname is grabbing everything all the way up to the last /.

http://([^:]*?):?([0-9]*)(/.*)

But that's not even what I would recommend. Try this instead:

(http://[^\s/]+)([^\s?#]*)

$1 should have http://brandonhsiao.com and $2 should have /essays/showers.html and any hash or query string is ignored.

Note that this is not designed to validate a URL, just to divide a URL up into the portion before the path, and the path itself. For example, it would happily accept invalid characters as part of the hostname. However, it does work fine for URLs with or without paths.

P.S. I don't know exactly what you are doing with this in Lisp, so I have taken the liberty of only testing it in other PCRE-compatible environments. Usually I test my answers in the exact context where they will be used.

$_ = "http://brandonhsiao.com/essays/showers.html";
m|(http://[^\s/]+)([^\s?#]*)|;
print "1 = '$1' and 2 = '$2'\n";

# [j@5 ~]$ perl test2.pl
# 1 = 'http://brandonhsiao.com' and 2 = '/essays/showers.html'

Upvotes: 3

Kevin Lee
Kevin Lee

Reputation: 718

http:\/\/([^:]*?)(\/.*)

The *? is a non-greedy match to the first slash (the one just after .com)

See http://rubular.com/r/VmU2ghAX0k for match groups

Upvotes: 0

snf
snf

Reputation: 3077

http://([^/:]*):?([0-9]*)(/.*)

The first group is matching everything but : and now I added /, that's because the [^] operator means match everything but what's inside the group, everything else is just the same.

Hope it helped!

Upvotes: 0

Related Questions