Reputation: 35218
When parsing a webpage with Mojo::DOM
(or any other framework), it's fairly common to be pulling a resource address that could be either relative or absolute. Is there a shortcut method to translate such a resource address to an absolute URL?
The following mojo
command pulls all the stylesheets on mojolicio.us:
$ mojo get http://mojolicio.us "link[rel=stylesheet]" attr href
/mojo/prettify/prettify-mojo-light.css
/css/index.css
And the following script does the same, but also uses URI
to translate the resource into an absolute URL.
use strict;
use warnings;
use Mojo::UserAgent;
use URI;
my $url = 'http://mojolicio.us';
my $ua = Mojo::UserAgent->new;
my $dom = $ua->get($url)->res->dom;
for my $csshref ($dom->find('link[rel=stylesheet]')->attr('href')->each) {
my $cssurl = URI->new($csshref)->abs($url);
print "$cssurl\n";
}
Outputs:
http://mojolicio.us/mojo/prettify/prettify-mojo-light.css
http://mojolicio.us/css/index.css
Obviously, a relative URL in this context should be made absolute using the URL that loaded DOM. However, I don't know of a way to get a resource absolute URL except for coding it myself.
There is Mojo::URL #to_abs
in Mojolicious
. However, I don't know if that would integrate in some way with Mojo::DOM
, and by itself would take more code than URI
.
My ideal solution would be if something like the following were possible from both a script and command line, but looking for any related insights into using Mojo for parsing:
mojo get http://mojolicio.us "link[rel=stylesheet]" attr href to_abs
Upvotes: 4
Views: 800
Reputation: 20280
I'm not sure why you think it would take more code to use Mojo::URL
? In the following example I get the actual request URL from the transaction (there might have been redirects, which I've allowed) which I have called $base
.
Then since $base
is an instance of Mojo::URL
I can create a new instance with $base->new
. Of course if that seems too magical, you can replace it with Mojo::URL->new
.
use Mojo::Base -strict;
use Mojo::UserAgent;
my $url = 'http://mojolicio.us';
my $ua = Mojo::UserAgent->new->max_redirects(10);
my $tx = $ua->get($url);
my $base = $tx->req->url;
$tx->res
->dom
->find('link[rel=stylesheet]')
->map(sub{$base->new($_->{href})->to_abs($base)})
->each(sub{say});
Upvotes: 2