Reputation: 763
I am trying to post several parameters to this [url][1] and press 'submit' to download a csv file generated.
I think 5 steps are needed at least.
Upvotes: 8
Views: 1033
Reputation: 66431
Since no one has posted a solution yet, here you go. You won't get far with requests, so selenium is your best choice here. If you want to use the below script without any modification, check that:
dl_dir = '/tmp'
to some directory you wantchromedriver
installed, or change the driver to firefox in code (and adapt the download dir configuration according to what firefox wants)Here is the environment tested with:
$ python -V
Python 3.5.3
$ chromedriver --version
ChromeDriver 2.33.506106 (8a06c39c4582fbfbab6966dbb1c38a9173bfb1a2)
$ pip list --format=freeze | grep selenium
selenium==3.6.0
I commented almost each and every line so let the code do the talk:
import os
import time
from selenium import webdriver
from selenium.webdriver.common import by
from selenium.webdriver.remote.webelement import WebElement
from selenium.webdriver.support import ui, expected_conditions as EC
def main():
dl_dir = '/tmp' # temporary download dir so I don't spam the real dl dir with csv files
# check what files are downloaded before the scraping starts (will be explained later)
csvs_old = {file for file in os.listdir(dl_dir) if file.startswith('NXSA-Results-') and file.endswith('.csv')}
# I use chrome so check if you have chromedriver installed
# pass custom dl dir to browser instance
chrome_options = webdriver.ChromeOptions()
prefs = {'download.default_directory' : '/tmp'}
chrome_options.add_experimental_option('prefs', prefs)
driver = webdriver.Chrome(chrome_options=chrome_options)
# open page
driver.get('http://nxsa.esac.esa.int/nxsa-web/#search')
# wait for search ui to appear (abort after 10 secs)
# once there, unfold the filters panel
ui.WebDriverWait(driver, 10).until(
EC.element_to_be_clickable((by.By.XPATH, '//td[text()="Observation and Proposal filters"]'))).click()
# toggle observation availability dropdown
driver.find_element_by_xpath('//input[@title="Observation Availability"]/../../td[2]/div/img').click()
# wait until the dropdown elements are available, then click "proprietary"
ui.WebDriverWait(driver, 10).until(
EC.element_to_be_clickable((by.By.XPATH, '//div[text()="Proprietary" and @class="gwt-Label"]'))).click()
# unfold display options panel
driver.find_element_by_xpath('//td[text()="Display options"]').click()
# deselect "pointed observations"
driver.find_element_by_id('gwt-uid-241').click()
# select "epic exposures"
driver.find_element_by_id('gwt-uid-240').click()
# uncomment if you want to go through the activated settings and verify them
# when commented, the form is submitted immediately
#time.sleep(5)
# submit the form
driver.find_element_by_xpath('//button/span[text()="Submit"]/../img').click()
# wait until the results table has at least one row
ui.WebDriverWait(driver, 10).until(EC.presence_of_element_located((by.By.XPATH, '//tr[@class="MPI"]')))
# click on save
driver.find_element_by_xpath('//span[text()="Save table as"]').click()
# wait for dropdown with "CSV" entry to appear
el = ui.WebDriverWait(driver, 10).until(EC.element_to_be_clickable((by.By.XPATH, '//a[@title="Save as CSV, Comma Separated Values"]')))
# somehow, the clickability does not suffice - selenium still whines about the wrong element being clicked
# as a dirty workaround, wait a fixed amount of time to let js finish ui update
time.sleep(1)
# click on "CSV" entry
el.click()
# now. selenium can't tell whether the file is being downloaded
# we have to do it ourselves
# this is a quick-and-dirty check that waits until a new csv file appears in the dl dir
# replace with watchdogs or whatever
dl_max_wait_time = 10 # secs
seconds = 0
while seconds < dl_max_wait_time:
time.sleep(1)
csvs_new = {file for file in os.listdir(dl_dir) if file.startswith('NXSA-Results-') and file.endswith('.csv')}
if csvs_new - csvs_old: # new file found in dl dir
print('Downloaded file should be one of {}'.format([os.path.join(dl_dir, file) for file in csvs_new - csvs_old]))
break
seconds += 1
# we're done, so close the browser
driver.close()
# script entry point
if __name__ == '__main__':
main()
If everything is fine, the script should output:
Downloaded file should be one of ['/tmp/NXSA-Results-1509061710475.csv']
Upvotes: 0
Reputation: 22440
Try this. You need to process the rest according to your need. Here is the gist part. It produces below results:
import requests
url = "http://nxsa.esac.esa.int/nxsa-sl/servlet/observations-metadata?RESOURCE_CLASS=OBSERVATION&ADQLQUERY=SELECT%20DISTINCT%20OBSERVATION.OBSERVATION_OID,OBSERVATION.MOVING_TARGET,OBSERVATION.OBSERVATION_ID,EPIC_OBSERVATION_IMAGE.ICON,EPIC_OBSERVATION_IMAGE.ICON_PREVIEW,RGS_FLUXED_OBSERVATION_IMAGE.ICON,RGS_FLUXED_OBSERVATION_IMAGE.ICON_PREVIEW,EPIC_MOVING_TARGET_OBSERVATION_IMAGE.ICON,EPIC_MOVING_TARGET_OBSERVATION_IMAGE.ICON_PREVIEW,RGS_FLUXED_MOVING_TARGET_OBSERVATION_IMAGE.ICON,RGS_FLUXED_MOVING_TARGET_OBSERVATION_IMAGE.ICON_PREVIEW,OM_OBSERVATION_IMAGE.ICON_PREVIEW_V,OM_OBSERVATION_IMAGE.ICON_PREVIEW_B,OM_OBSERVATION_IMAGE.ICON_PREVIEW_L,OM_OBSERVATION_IMAGE.ICON_PREVIEW_U,OM_OBSERVATION_IMAGE.ICON_PREVIEW_M,OM_OBSERVATION_IMAGE.ICON_PREVIEW_S,OM_OBSERVATION_IMAGE.ICON_PREVIEW_W,OM_OBSERVATION_IMAGE.ICON_V,OM_OBSERVATION_IMAGE.ICON_B,OM_OBSERVATION_IMAGE.ICON_L,OM_OBSERVATION_IMAGE.ICON_U,OM_OBSERVATION_IMAGE.ICON_M,OM_OBSERVATION_IMAGE.ICON_S,OM_OBSERVATION_IMAGE.ICON_W,OBSERVATION.REVOLUTION,OBSERVATION.PROPRIETARY_END_DATE,OBSERVATION.RA_NOM,OBSERVATION.DEC_NOM,OBSERVATION.POSITION_ANGLE,OBSERVATION.START_UTC,OBSERVATION.END_UTC,OBSERVATION.DURATION,OBSERVATION.TARGET,PROPOSAL.TYPE,PROPOSAL.CATEGORY,PROPOSAL.AO,PROPOSAL.PI_FIRST_NAME,PROPOSAL.PI_SURNAME,TARGET_TYPE.DESCRIPTION,OBSERVATION.LII,OBSERVATION.BII,OBSERVATION.ODF_VERSION,OBSERVATION.PPS_VERSION,OBSERVATION.COORD_OBS,OBSERVATION.COORD_TYPE%20FROM%20FIELD_NOT_USED%20%20WHERE%20OBSERVATION.PROPRIETARY_END_DATE%3E%272017-10-18%27%20%20AND%20%20(PROPOSAL.TYPE=%27Calibration%27%20OR%20PROPOSAL.TYPE=%27Int%20Calibration%27%20OR%20PROPOSAL.TYPE=%27Co-Chandra%27%20OR%20PROPOSAL.TYPE=%27Co-ESO%27%20OR%20PROPOSAL.TYPE=%27GO%27%20OR%20PROPOSAL.TYPE=%27HST%27%20OR%20PROPOSAL.TYPE=%27Large%27%20OR%20PROPOSAL.TYPE=%27Large-Joint%27%20OR%20PROPOSAL.TYPE=%27Triggered%27%20OR%20PROPOSAL.TYPE=%27Target-Opportunity%27%20OR%20PROPOSAL.TYPE=%27TOO%27%20OR%20PROPOSAL.TYPE=%27Triggered-Joint%27)%20%20%20ORDER%20BY%20OBSERVATION.OBSERVATION_ID&PAGE=1&PAGE_SIZE=100&RETURN_TYPE=JSON"
res = requests.get(url)
data = res.json()
result = data['data']
for item in result:
ID = item['OBSERVATION__OBSERVATION_ID']
Surname = item['PROPOSAL__PI_SURNAME']
Name = item['PROPOSAL__PI_FIRST_NAME']
print(ID,Surname,Name)
Partial results (ID and Name):
0740071301 La Palombara Nicola
0741732601 Kaspi Victoria
0741732701 Kaspi Victoria
0741732801 Kaspi Victoria
0742150101 Grosso Nicolas
0742240801 Roberts Timothy
Btw, when you reach the target page you will notice two tabs there. This results are derived from (OBSERVATIONS) tab. The link i used above can be found in the chrome developer tools as well.
Upvotes: 0
Reputation: 3054
Unfortunately, I don't think you're going to be able to do this via requests. As far as I can tell, there is no POST being made when you click "Submit". It appears as though all the data is being generated by JavaScript, which requests can't deal with.
You could try using something like Selenium to automate a browser (which can handle the JS) and then scrape data from there.
Upvotes: 1