Miller
Miller

Reputation: 35218

Mojo::DOM shortcut to get absolute url for a resource?

When parsing a webpage with Mojo::DOM (or any other framework), it's fairly common to be pulling a resource address that could be either relative or absolute. Is there a shortcut method to translate such a resource address to an absolute URL?

The following mojo command pulls all the stylesheets on mojolicio.us:

$ mojo get http://mojolicio.us "link[rel=stylesheet]" attr href
/mojo/prettify/prettify-mojo-light.css
/css/index.css

And the following script does the same, but also uses URI to translate the resource into an absolute URL.

use strict;
use warnings;

use Mojo::UserAgent;
use URI;

my $url = 'http://mojolicio.us';

my $ua = Mojo::UserAgent->new;
my $dom = $ua->get($url)->res->dom;

for my $csshref ($dom->find('link[rel=stylesheet]')->attr('href')->each) {
    my $cssurl = URI->new($csshref)->abs($url);
    print "$cssurl\n";
}

Outputs:

http://mojolicio.us/mojo/prettify/prettify-mojo-light.css
http://mojolicio.us/css/index.css

Obviously, a relative URL in this context should be made absolute using the URL that loaded DOM. However, I don't know of a way to get a resource absolute URL except for coding it myself.

There is Mojo::URL #to_abs in Mojolicious. However, I don't know if that would integrate in some way with Mojo::DOM, and by itself would take more code than URI.

My ideal solution would be if something like the following were possible from both a script and command line, but looking for any related insights into using Mojo for parsing:

mojo get http://mojolicio.us "link[rel=stylesheet]" attr href to_abs

Upvotes: 4

Views: 800

Answers (1)

Joel Berger
Joel Berger

Reputation: 20280

I'm not sure why you think it would take more code to use Mojo::URL? In the following example I get the actual request URL from the transaction (there might have been redirects, which I've allowed) which I have called $base.

Then since $base is an instance of Mojo::URL I can create a new instance with $base->new. Of course if that seems too magical, you can replace it with Mojo::URL->new.

use Mojo::Base -strict;
use Mojo::UserAgent;

my $url = 'http://mojolicio.us';

my $ua = Mojo::UserAgent->new->max_redirects(10);
my $tx = $ua->get($url);
my $base = $tx->req->url;

$tx->res
  ->dom
  ->find('link[rel=stylesheet]')
  ->map(sub{$base->new($_->{href})->to_abs($base)})
  ->each(sub{say});

Upvotes: 2

Related Questions