kramer65
kramer65

Reputation: 53873

How to get download location from javascript link?

I'm trying to programmatically download a pdf from a website in which the link loads some javascript:

<a href="javascript:__doPostBack('downloadTop','')">Download</a>

Seeing that a wget or any comparable mechanism would obviously fail I decided to use selenium with Phantomjs to emulate a real browser and javascript interpreter to actually see what it does if I "click" on the download link. According to this github issue Phantomjs currently does not support file download. The thing is that I don't even need to download the file using Phantomjs, I just want to get the direct url from the file so that I can simply download it using something like wget.

So I tried the following:

>>> from selenium import webdriver
>>> driver = webdriver.PhantomJS()
>>> link = 'https://www.yourticketprovider.nl/LiveContent/tickets.aspx?x=492449&y=8687&px=92AD8EAA22C9223FBCA3102EE0AE2899510C03E398A8A08A222AFDACEBFF8BA95D656F01FB04A1437669EC46E93AB5776A33951830BBA97DD94DB1729BF42D76&rand=a17cafc7-26fe-42d9-a61a-894b43a28046&utm_source=PurchaseSuccess&utm_medium=Email&utm_campaign=SystemMails'
>>> driver.get(link)
>>> for linkElement in driver.find_elements_by_tag_name('a'):
...    print linkElement.get_attribute('href')
...    
https://www.yourticketprovider.nl/
javascript:__doPostBack('downloadTop','')
https://www.yourticketprovider.nl/LiveContent/tickets.aspx?x=492449&y=8687&px=92AD8EAA22C9223FBCA3102EE0AE2899510C03E398A8A08A222AFDACEBFF8BA95D656F01FB04A1437669EC46E93AB5776A33951830BBA97DD94DB1729BF42D76&rand=a17cafc7-26fe-42d9-a61a-894b43a28046&utm_source=PurchaseSuccess&utm_medium=Email&utm_campaign=SystemMails#
etc. etc.

Since I need to get the second element I tried the following:

>>> a = driver.find_elements_by_tag_name('a')[1].click()
>>> print a
None

and from here I'm kinda stuck.

Does anybody know how I can click that link and get the resulting download url? All tips are welcome!

Upvotes: 2

Views: 2075

Answers (1)

Tom&#225;š Zato
Tom&#225;š Zato

Reputation: 53149

tl;dr: The link actually triggers hidden form form#form1

The correct way to figure these things out is to forget javascript and open your console, specifically the network panel. So what I did was that I opened the panel and I could clearly see:

image description

The POST here is important, it means it uses request using POST HTTP, so you can't just use wget to get it. Wget only allows you to define URL as far as I know. I could also inspect both GET (the ?blah=blah) and POST (those go after request header) parameters:

image description

I noticed that the GET parameters match those in the URL you shared. So all you need is to copy the POST parameters. This code snippet can get'em from the hidden form:

function paramsToObject(form) {
  var fields = {};
  for(var i=0,l=form.length; i<l; i++) {
      fields[form[i].name] = form[i].value;
  }
  return fields;
}
console.log(JSON.stringify(paramsToObject(document.forms[0])));

The URL also can be obtained by getting the form target attribute.

Upvotes: 2

Related Questions