xShirase
xShirase

Reputation: 12389

Phantomjs/Casperjs get url from JS script inside page

I'm building a scraper with phantom/casper.

At this point, I need to extract a URL that appears in the page only inside a js script.

Example of the page source code :

<script>
    queueRequest('URL.aspx?var1='+VAR1+'&var2='+VAR2, getPageMenu');
</script>

I have no problem evaluating VAR1 and VAR2, as they are in the page context, but I need URL, which is hardcoded and has no reference to it. URL is of course different according to the page I'm on and I have no way of guessing it. Any ideas?

My ideas :

  1. As the URL is called on page load to fill a div wih AJAX, I was thinking of maybe capturing the XHR request, but I don't know how.

  2. I managed to get the script elem I need, using document.getElementsByTagName('script'). That may be one way to go, but how do I get only the line I need out of 200+ lines? (the one starting with queueRequest)

SO to make my question clear :

Which idea is better, 1 or 2?

if 1 : How do I capture the request URL with casper?

if 2 : How do I get the right line in my script?

Upvotes: 0

Views: 964

Answers (1)

struthersneil
struthersneil

Reputation: 2750

If you want to search your script blocks, you can try something like this:

found = null;
scripts = document.getElementsByTagName('script');

for (i = 0; i < scripts.length; i++)
{
  matches = /queueRequest\('(.+)\?/.exec(scripts[i].innerText)

  if (matches) 
  {
    found = matches[1];
    break;
  }
}

alert(found);

There might be tighter ways to implement the same thing but the regex is roughly what you're after. Note that this will only get you the URL part of the first appearance of queueRequest('something.something?...) in embedded script blocks.

Upvotes: 2

Related Questions