Reputation: 763

How to locate the four elements using selenium in python

I am trying to post several parameters to this [url][1] and press 'submit' to download a csv file generated.

I think 5 steps are needed at least.

Upvotes: 8

Answers (3)

hoefling

Reputation: 66431

Since no one has posted a solution yet, here you go. You won't get far with requests, so selenium is your best choice here. If you want to use the below script without any modification, check that:

you are on linux or macos, or change dl_dir = '/tmp' to some directory you want
you have chromedriver installed, or change the driver to firefox in code (and adapt the download dir configuration according to what firefox wants)

Here is the environment tested with:

$ python -V
Python 3.5.3
$ chromedriver --version
ChromeDriver 2.33.506106 (8a06c39c4582fbfbab6966dbb1c38a9173bfb1a2)
$ pip list --format=freeze | grep selenium
selenium==3.6.0

I commented almost each and every line so let the code do the talk:

import os
import time
from selenium import webdriver
from selenium.webdriver.common import by
from selenium.webdriver.remote.webelement import WebElement
from selenium.webdriver.support import ui, expected_conditions as EC


def main():
    dl_dir = '/tmp'  # temporary download dir so I don't spam the real dl dir with csv files
    # check what files are downloaded before the scraping starts (will be explained later)
    csvs_old = {file for file in os.listdir(dl_dir) if file.startswith('NXSA-Results-') and file.endswith('.csv')}

    # I use chrome so check if you have chromedriver installed
    # pass custom dl dir to browser instance
    chrome_options = webdriver.ChromeOptions()
    prefs = {'download.default_directory' : '/tmp'}
    chrome_options.add_experimental_option('prefs', prefs)
    driver = webdriver.Chrome(chrome_options=chrome_options)
    # open page
    driver.get('http://nxsa.esac.esa.int/nxsa-web/#search')

    # wait for search ui to appear (abort after 10 secs)
    # once there, unfold the filters panel
    ui.WebDriverWait(driver, 10).until(
        EC.element_to_be_clickable((by.By.XPATH, '//td[text()="Observation and Proposal filters"]'))).click()
    # toggle observation availability dropdown
    driver.find_element_by_xpath('//input[@title="Observation Availability"]/../../td[2]/div/img').click()
    # wait until the dropdown elements are available, then click "proprietary"
    ui.WebDriverWait(driver, 10).until(
        EC.element_to_be_clickable((by.By.XPATH, '//div[text()="Proprietary" and @class="gwt-Label"]'))).click()
    # unfold display options panel
    driver.find_element_by_xpath('//td[text()="Display options"]').click()
    # deselect "pointed observations"
    driver.find_element_by_id('gwt-uid-241').click()
    # select "epic exposures"
    driver.find_element_by_id('gwt-uid-240').click()

    # uncomment if you want to go through the activated settings and verify them
    # when commented, the form is submitted immediately
    #time.sleep(5)

    # submit the form
    driver.find_element_by_xpath('//button/span[text()="Submit"]/../img').click()
    # wait until the results table has at least one row
    ui.WebDriverWait(driver, 10).until(EC.presence_of_element_located((by.By.XPATH, '//tr[@class="MPI"]')))
    # click on save
    driver.find_element_by_xpath('//span[text()="Save table as"]').click()
    # wait for dropdown with "CSV" entry to appear
    el = ui.WebDriverWait(driver, 10).until(EC.element_to_be_clickable((by.By.XPATH, '//a[@title="Save as CSV, Comma Separated Values"]')))
    # somehow, the clickability does not suffice - selenium still whines about the wrong element being clicked
    # as a dirty workaround, wait a fixed amount of time to let js finish ui update
    time.sleep(1)
    # click on "CSV" entry
    el.click()

    # now. selenium can't tell whether the file is being downloaded
    # we have to do it ourselves
    # this is a quick-and-dirty check that waits until a new csv file appears in the dl dir
    # replace with watchdogs or whatever
    dl_max_wait_time = 10  # secs
    seconds = 0
    while seconds < dl_max_wait_time:
        time.sleep(1)
        csvs_new = {file for file in os.listdir(dl_dir) if file.startswith('NXSA-Results-') and file.endswith('.csv')}
        if csvs_new - csvs_old:  # new file found in dl dir
            print('Downloaded file should be one of {}'.format([os.path.join(dl_dir, file) for file in csvs_new - csvs_old]))
            break
        seconds += 1

    # we're done, so close the browser
    driver.close()


# script entry point
if __name__ == '__main__':
    main()

If everything is fine, the script should output:

Downloaded file should be one of ['/tmp/NXSA-Results-1509061710475.csv']

Upvotes: 0

SIM

Reputation: 22440

Try this. You need to process the rest according to your need. Here is the gist part. It produces below results:

import requests 

url = "http://nxsa.esac.esa.int/nxsa-sl/servlet/observations-metadata?RESOURCE_CLASS=OBSERVATION&ADQLQUERY=SELECT%20DISTINCT%20OBSERVATION.OBSERVATION_OID,OBSERVATION.MOVING_TARGET,OBSERVATION.OBSERVATION_ID,EPIC_OBSERVATION_IMAGE.ICON,EPIC_OBSERVATION_IMAGE.ICON_PREVIEW,RGS_FLUXED_OBSERVATION_IMAGE.ICON,RGS_FLUXED_OBSERVATION_IMAGE.ICON_PREVIEW,EPIC_MOVING_TARGET_OBSERVATION_IMAGE.ICON,EPIC_MOVING_TARGET_OBSERVATION_IMAGE.ICON_PREVIEW,RGS_FLUXED_MOVING_TARGET_OBSERVATION_IMAGE.ICON,RGS_FLUXED_MOVING_TARGET_OBSERVATION_IMAGE.ICON_PREVIEW,OM_OBSERVATION_IMAGE.ICON_PREVIEW_V,OM_OBSERVATION_IMAGE.ICON_PREVIEW_B,OM_OBSERVATION_IMAGE.ICON_PREVIEW_L,OM_OBSERVATION_IMAGE.ICON_PREVIEW_U,OM_OBSERVATION_IMAGE.ICON_PREVIEW_M,OM_OBSERVATION_IMAGE.ICON_PREVIEW_S,OM_OBSERVATION_IMAGE.ICON_PREVIEW_W,OM_OBSERVATION_IMAGE.ICON_V,OM_OBSERVATION_IMAGE.ICON_B,OM_OBSERVATION_IMAGE.ICON_L,OM_OBSERVATION_IMAGE.ICON_U,OM_OBSERVATION_IMAGE.ICON_M,OM_OBSERVATION_IMAGE.ICON_S,OM_OBSERVATION_IMAGE.ICON_W,OBSERVATION.REVOLUTION,OBSERVATION.PROPRIETARY_END_DATE,OBSERVATION.RA_NOM,OBSERVATION.DEC_NOM,OBSERVATION.POSITION_ANGLE,OBSERVATION.START_UTC,OBSERVATION.END_UTC,OBSERVATION.DURATION,OBSERVATION.TARGET,PROPOSAL.TYPE,PROPOSAL.CATEGORY,PROPOSAL.AO,PROPOSAL.PI_FIRST_NAME,PROPOSAL.PI_SURNAME,TARGET_TYPE.DESCRIPTION,OBSERVATION.LII,OBSERVATION.BII,OBSERVATION.ODF_VERSION,OBSERVATION.PPS_VERSION,OBSERVATION.COORD_OBS,OBSERVATION.COORD_TYPE%20FROM%20FIELD_NOT_USED%20%20WHERE%20OBSERVATION.PROPRIETARY_END_DATE%3E%272017-10-18%27%20%20AND%20%20(PROPOSAL.TYPE=%27Calibration%27%20OR%20PROPOSAL.TYPE=%27Int%20Calibration%27%20OR%20PROPOSAL.TYPE=%27Co-Chandra%27%20OR%20PROPOSAL.TYPE=%27Co-ESO%27%20OR%20PROPOSAL.TYPE=%27GO%27%20OR%20PROPOSAL.TYPE=%27HST%27%20OR%20PROPOSAL.TYPE=%27Large%27%20OR%20PROPOSAL.TYPE=%27Large-Joint%27%20OR%20PROPOSAL.TYPE=%27Triggered%27%20OR%20PROPOSAL.TYPE=%27Target-Opportunity%27%20OR%20PROPOSAL.TYPE=%27TOO%27%20OR%20PROPOSAL.TYPE=%27Triggered-Joint%27)%20%20%20ORDER%20BY%20OBSERVATION.OBSERVATION_ID&PAGE=1&PAGE_SIZE=100&RETURN_TYPE=JSON"
res = requests.get(url)
data = res.json()
result = data['data']

for item in result:
    ID = item['OBSERVATION__OBSERVATION_ID']   
    Surname = item['PROPOSAL__PI_SURNAME']
    Name = item['PROPOSAL__PI_FIRST_NAME']
    print(ID,Surname,Name)

Partial results (ID and Name):

0740071301 La Palombara Nicola
0741732601 Kaspi Victoria
0741732701 Kaspi Victoria
0741732801 Kaspi Victoria
0742150101 Grosso Nicolas
0742240801 Roberts Timothy

Btw, when you reach the target page you will notice two tabs there. This results are derived from (OBSERVATIONS) tab. The link i used above can be found in the chrome developer tools as well.

Upvotes: 0

SuperStew

Reputation: 3054

Unfortunately, I don't think you're going to be able to do this via requests. As far as I can tell, there is no POST being made when you click "Submit". It appears as though all the data is being generated by JavaScript, which requests can't deal with.

You could try using something like Selenium to automate a browser (which can handle the JS) and then scrape data from there.

Upvotes: 1

How to locate the four elements using selenium in python

Answers (3)

Related Questions