Reputation: 767
The general format of a URL is
scheme://domain:port/path?query_string#fragment_id
While domain (and possible other parts of the URL) may contain Unicode characters, in the following we assume that only ASCII characters are used. Furthermore, we assume that
scheme
only consists of lettersa–z
andA–Z
;domain
does not contain:
,?
,#
or/
;port
is a natural number,:port
is optional;path
does not contain?
or#
,path
is optional;query_string
does not contain#
,?query_string
is optional;fragment_id
can contain arbitrary characters,#fragment_id
is optional.
Here is my code:
@urls = (
"http://www.example.com/",
"http://www80.local.com:80/",
"https://www.ex221.ac.uk:442/perl/rulez?all+q#all.time");
foreach (@urls) {
print "URL: $_\n";
($scheme,$domain,$port,$path,$query,$fragment) = (/(.)(.)(.)(.)(.)(.)/);
print "SCHEME: $scheme, DOMAIN: $domain, PORT: $port\n";
print "PATH: $path\n"; print "QUERY: $query\n";
print "FRAGMENT: $fragment\n\n";
}
How to change the regular expression in the code above so that it correctly separates the five components of a URL and use the sample URLs to test that it works as expected.
Upvotes: 2
Views: 1323
Reputation: 385655
Regular expressions are documented in perlre (reference manual) and perlretut (tutorial).
That said, the following is all the information you need to complete your assignment.
To match any of a number of characters, you can use character class.
[abcdef] # Matches a, b, c, d, e or f
You can use ranges of letters.
[a-zA-Z] # Matches any lowercase or uppercase letter
To match any characters except some, start the class with ^
.
[^abcdef] # Matches any character except a, b, c, d, e or f
If you follow something with *
, it means zero or more of that something.
ab*c # Matches ac, abc, abbc, abbbc, ...
Don't forget to escape special characters with \
if you don't want their special meaning.
ab\*c # Matches ab*c
Upvotes: 1
Reputation: 13792
I recommend that you use the URI module:
use URI;
my @urls = (
"http://www.example.com/",
"http://www80.local.com:80/",
"https://www.ex221.ac.uk:442/perl/rulez?all+q#all.time");
foreach (@urls) {
my $uri = URI->new($_);
print "URL: $_\n";
print "SCHEME: ", $uri->scheme, "\n";
print "DOMAIN: ", $uri->host, "\n";
print "PORT: ", $uri->port, "\n";
print "PATH: ", $uri->path, "\n";
print "QUERY: ", $uri->query, "\n";
print "FRAGMENT: ", $uri->fragment, "\n";
}
Upvotes: 8