Apolo Reis
Apolo Reis

Reputation: 97

How to get the json data in this site in python 3?

My work is basically:

-Entering in this website "https://aplicacoes.mds.gov.br/sagirmps/estrutura_fisica/preenchimento_municipio_cras_new1.php"

-Fill the 2 forms (with AC - Acre and Bujari, for example)

-Click on "Dados Detalhados"(detailed data) in the last column of the table generated. (When you click on "Dados Detalhados", it will generate a second table with the data of 1 month per row).

-Access the data generated by the second table clicking in "Visualizar Relatório" in the last column of each row. <---- THATS the data I'm trying to scrape. But it is a dynamic website and I can't get the data just accessing the url2 (when you click in 'Visualizar relatório' the website returns to the initial url but with the tables I want to scrape). Here is the code:

import requests
from bs4 import BeautifulSoup  
import pandas as pd

url = 'http://aplicacoes.mds.gov.br/sagirmps/estrutura_fisica/preenchimento_municipio_cras_new1.php'
params ={
    'uf_ibge': '12',
    'nome_estado': 'AC - Acre'
    'p_ibge': '1200138'
    'nome_municipio': 'Bujari'    
}


r = requests.post(url, params = params, verify = False)
soup = BeautifulSoup(r.text, "lxml")
tables = pd.read_html(r.text)
unidades = tables[1]
print(unidades)


url2 = 'http://aplicacoes.mds.gov.br/sagirmps/estrutura_fisica/rel_preenchidos_cras.php?&p_id_cras=12001301971'
params2 ={
    'p_id_cras': '12001301971'
    'mes_referencia': '2019-02-01'
}
r2 = requests.post(url2, json = params2, verify = False)
soup2 = BeautifulSoup(r2.text, 'lxml')

soup2

Note that url2 is the url generated when you click in "Dados Detalhados" and it has the 'p_id_cras' as the second dictionary.

params2 should be the dict used to scrape that data I'm talking about. I've tried the commands params, data and json in the second post request, but none of them works.

Upvotes: 1

Views: 65

Answers (1)

furas
furas

Reputation: 142641

url2 should use GET without parameters.
And then you have page with table with links which have href="javascript:"
but also onclick='enviadados(12001301971,"2019-02-01")'
so you have your parameters for next request.

Last request uses POST with parameters 12001301971,2019-02-01 and url

https://aplicacoes.mds.gov.br/sagirmps/estrutura_fisica/visualiza_preenchimento_cras.php'`

My code. I hope it works correclty.

import requests
from bs4 import BeautifulSoup  
import pandas as pd

base = 'http://aplicacoes.mds.gov.br/sagirmps/estrutura_fisica/'

url = base + 'preenchimento_municipio_cras_new1.php'
#print('url:', url)
params ={
    'uf_ibge': '12',
    'nome_estado': 'AC - Acre',
    'p_ibge': '1200138',
    'nome_municipio': 'Bujari'    ,
}


r = requests.post(url, params=params, verify=False)
soup1 = BeautifulSoup(r.text, "lxml")
tables = pd.read_html(r.text)

#unidades = tables[1]
#print(unidades)

all_td1 = soup1.find('table', class_="panel-body").find_all('td')
#print('len(all_td1):', len(all_td1))
for td1 in all_td1:

    all_a1 = td1.find_all('a')[:1]
    #print('len(all_a1):', len(all_a1))
    for a1 in all_a1:

        url = base + a1['href']
        print('url:', url)

        r = requests.get(url, verify=False)
        soup2 = BeautifulSoup(r.text, "lxml")
        #print(soup.text)

        all_td2 = soup2.find('table', class_="panel-body").find_all('td')
        #print('len(all_td2):', len(all_td2))
        for td2 in all_td2:
            all_a2 = td2.find_all('a')
            #print('len(all_a2):', len(all_a2))
            for a2 in all_a2:
                print('onclick:', a2['onclick'])

                params = {
                    'p_id_cras': a2['onclick'][11:22], #'12001301971',
                    'mes_referencia': a2['onclick'][24:-2], #'2019-02-01',
                }

                print(params)

                url = 'https://aplicacoes.mds.gov.br/sagirmps/estrutura_fisica/visualiza_preenchimento_cras.php'
                r = requests.post(url, params=params, verify=False)
                soup = BeautifulSoup(r.text, "lxml")
                all_table = soup.find_all('table')
                print(all_table)

Upvotes: 1

Related Questions