XaniloM
XaniloM

Reputation: 33

Replace Selenium with Requests in dynamic form

I want to webscrape [this][1] page dynamic form, I'm using Selenium right now for that and obtaining some results.

My questions:

  1. It is possible to replace the Selenium + WebDriver code with some POST Request? (I have worked with Requests before, but only when an API is available... I can't figure out how to reverse code this form)

  2. Is there a better way to clean up the result page to get only the table? (In my example, the result "data" variable is a mess, but anyway I have obtained the last value which was the main purpose of the script)

  3. Any recommendations?

My code:

from selenium import webdriver
import pandas as pd

from bs4 import BeautifulSoup

def get_tables(htmldoc):
    soup = BeautifulSoup(htmldoc)
    return soup.findAll('table')

driver = webdriver.Chrome()
driver.get("http://dgasatel.mop.cl/visita_new.asp")
estacion1 = driver.find_element_by_name("estacion1")
estacion1.send_keys("08370007-6")
driver.find_element_by_xpath("//input[@name='chk_estacion1a' and @value='08370007-6_29']").click()
driver.find_element_by_xpath("//input[@name='period' and @value='1d']").click()
driver.find_element_by_xpath("//input[@name='tiporep' and @value='I']").click()
driver.find_element_by_name("button22").click()

data = pd.read_html(driver.page_source)

print(data[4].tail(1).iloc[0][2])

Thanks in advance. [1]: http://dgasatel.mop.cl/visita_new.asp

Upvotes: 3

Views: 993

Answers (1)

B.Adler
B.Adler

Reputation: 1539

The short answer to your question is yes, you can use the requests library to make the post requests. For an example you can easily open the inspector on your browser and copy a request using the following site:

https://curl.trillworks.com/

Then you can feed the response.text into BeautifulSoup to parse out the tables you want.

When I do this with the site in your example I get the following:

import requests

cookies = {
    'ASPSESSIONIDCQTTBCRB': 'BFDPGLCCEJMKPFKGJJFHKHFC',
}

headers = {
    'Connection': 'keep-alive',
    'Pragma': 'no-cache',
    'Cache-Control': 'no-cache',
    'Origin': 'http://dgasatel.mop.cl',
    'Upgrade-Insecure-Requests': '1',
    'Content-Type': 'application/x-www-form-urlencoded',
    'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36',
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
    'Referer': 'http://dgasatel.mop.cl/filtro_paramxestac_new.asp',
    'Accept-Encoding': 'gzip, deflate',
    'Accept-Language': 'en-US,en;q=0.9',
}

data = {
  'estacion1': '-1',
  'estacion2': '-1',
  'estacion3': '-1',
  'accion': 'refresca',
  'tipo': 'ANO',
  'fecha_fin': '11/12/2018',
  'hora_fin': '0',
  'period': '1d',
  'fecha_ini': '11/12/2018',
  'fecha_finP': '11/12/2018',
  'UserID': 'nobody',
  'EsDL1': '0',
  'EsDL2': '0',
  'EsDL3': '0'
}

response = 
requests.post(
    'http://dgasatel.mop.cl/filtro_paramxestac_new.asp',
    headers=headers, cookies=cookies, data=data)

For cleaning up the data, I recommend you map the data points you are wanting onto a dictionary or into a csv with loops.

for table in data:
    if table.tail(1) and table.tail(1).iloc:
        print(table.tail(1).iloc[0][2])

Upvotes: 1

Related Questions