Reputation: 63
Using Python 3 I am trying to download a report. I have tried
import urllib.request
url = 'https://newss.statistics.gov.my/newss-portalx/ep/epDownloadContentSearch.seam?contentId=145851&actionMethod=ep%2FepDownloadContentSearch.xhtml%3AcontentAction.doDisplayContent'
urllib.request.urlretrieve(url,'tmp_file.xlsx')
but to no avail; it has downloaded html instead. Is there a Python package that can handle a seemingly php function ( not sure ) url link like Internet Download Manager as below? Whereby it can detect the filename and performs download accordingly.
Update 1
If you put the url into web browser during the first time it will trigger login, but if you put in the url again (second time and onwards) it will trigger download.
Upvotes: 0
Views: 340
Reputation: 841
Python code runs fine however the page you are trying to download from requires login. You might have better luck using Selenium.
https://pypi.org/project/selenium/
Update
Since it works if you visit the page twice this works with Selenium
Install Selenium package using
pip install selenium
Download Chrome webdriver which matches your version of Chrome from
https://sites.google.com/a/chromium.org/chromedriver/downloads
Place the driver where you python script is located
Try this script
import selenium
import time
from selenium import webdriver
browser = webdriver.Chrome()
browser.get('https://newss.statistics.gov.my/newss-portalx/ep/epDownloadContentSearch.seam?contentId=145851&actionMethod=ep%2FepDownloadContentSearch.xhtml%3AcontentAction.doDisplayContent')
browser.get('https://newss.statistics.gov.my/newss-portalx/ep/epDownloadContentSearch.seam?contentId=145851&actionMethod=ep%2FepDownloadContentSearch.xhtml%3AcontentAction.doDisplayContent')
time.sleep(2)
browser.quit()
Upvotes: 1