paarth batra
paarth batra

Reputation: 1402

downloading a file from a webpage using python script without url , calling onClick function

There is a webpage which have a link "Click to Download" Clicking which a file is downloaded . I can download this file manually by going to webpage and clicking on this link however I need to download this file via a python script .

If i see the source i can see the anchor tag is will run a js function

<a class="download-data-link1" onclick=" document.forms['dataform'].submit()" style="cursor:pointer; vertical-align: middle;">Download in csv</a>

But i dont know the url of csv file and i am looking for a way to download it via python .

I know we can download a file if we have url using httplib but couldnt understand how to get a file without url .

Tried few things like in header added 'Content-Disposition': 'attachment;filename="data.csv"'}

but it dosent seems to work . Any ideas ?

Upvotes: 4

Views: 10061

Answers (2)

paarth batra
paarth batra

Reputation: 1402

Thanks all for your answers but I want to add, how i implemented it.

  1. First of all you can create a firefox profile. To do that:
  2. Close all firefox browsers
  3. go to cmd prompt and execute firefox.exe -P
  4. create a profile and note down the name of the folder where new profile is created

You can set some options for your profile here, like - automatically download these kind of files from content etc.

Now Download selenium for python and use below code

import os
from selenium import webdriver

download_dir="D:\a"

fp = webdriver.FirefoxProfile(<profile directory here as in step 4>)
fp.set_preference("browser.download.dir", download_dir)
fp.set_preference("browser.download.manager.showWhenStarting",False)
fp.set_preference("browser.helperApps.neverAsk.saveToDisk", "application/octet-stream");

browser = webdriver.Firefox(firefox_profile=fp)
browser.get("http://pypi.python.org/pypi/selenium")
# you can use your url here 
browser.find_element_by_partial_link_text("selenium-2").click()
# Use your method to identify class or link text here
browser.close();

Hope this may help others :)

Upvotes: 3

alecxe
alecxe

Reputation: 473773

Two basic options can be applied here:

  • mimic the logic involved in the onclick() call - in your case, make the dataform form submission using requests, or mechanize
  • high-level approach - automate a real browser, headless (PhantomJS) or not, using selenium - find the link and click it:

    from selenium import webdriver
    
    driver = webdriver.PhantomJS()
    driver.get('url here')
    
    driver.find_element_by_class_name('download-data-link1').click()
    

Though, as far as I understand, clicking the link would trigger a "Download" browser dialog to appear - then PhantomJS is not an option since it doesn't support downloads. In case of Chrome or Firefox you would need to tweak browser capabilities to automatically download files without opening the popup, see:

Upvotes: 2

Related Questions