Elias Urra
Elias Urra

Reputation: 93

Web Scrape with python on beautifulsoup or any other library on a Dynamic Web Page

I am trying to make a webscraping on this site which has info but I have not succeeded.

I need to create a dataframe with the columns: 'Fecha', 'Hora', 'Area', 'Turno', '% Solidos', 'Malla 65#', 'Comentarios'

The idea is to scrape from an init_date to an end_date defined by me in the python script and to append every day at the end of the dataframe. Can anyone help me?

Upvotes: 0

Views: 68

Answers (2)

chitown88
chitown88

Reputation: 28640

Andrej's solution works provided the token doesn't change. You get that token by the login post (which by the way if it's not some generic guest username and pass, you should change...your data can be found in the link you provided). I'm posting here slightly encrypted, but anyone can go to that link and get it, CHANGE IT NOW!). If it is a generic one, then let me know, and I'll just edit this post

import requests
import pandas as pd


#Get the token
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36'}
login = {
'Password': "12****",
'Usuario': "ag******"}

response = requests.post('https://apidch.kairosmining.com/auth/login', headers=headers, data=login).json()

#Use the token to input into the headers
url = 'https://apidch.kairosmining.com/api/CalidadAFlotacion'
headers.update({'authorization': 'Bearer %s' %response['token']})
payload = {
'fecha': "2021-5-12",
'parametro': '1'}

jsonData = requests.post(url, headers=headers, data=payload).json()
df = pd.DataFrame(jsonData)

Upvotes: 1

Andrej Kesely
Andrej Kesely

Reputation: 195593

The data is loaded from external URL using Javascript. You can use this example how to load it to pandas DataFrame:

import requests
import pandas as pd


api_url = "https://apidch.kairosmining.com/api/CalidadAFlotacion"

payload = {"fecha": "2021-5-15", "parametro": 1}  # <-- you can change the "fecha" here
headers = {
    "Authorization": "Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJ1c3IiOiJhZ2FqYXJkbyIsImVtbCI6ImFuZ2VsLmdhamFyZG9AaG9uZXl3ZWxsLmNvbSIsImlhdCI6MTYyMTI3MDc3MSwiZXhwIjoxNjIyNDgwMzcxfQ.pXy_UiOA6jz3YO4iMRulXbeEVTgSqgjzcJVd0URhp3I"
}

data = requests.post(api_url, headers=headers, json=payload).json()

df = pd.DataFrame(data)
print(df)

Prints:

      Id                     Fecha Area Turno  Solidos  Malla Comentarios         dia   hora
0   5395  2021-05-15T00:18:11.256Z   A1     B       41     32        None  15-05-2021  00:18
1   5396  2021-05-15T00:46:19.146Z   A0     B       44     32        None  15-05-2021  00:46
2   5397  2021-05-15T00:56:39.883Z   A2     B       42     29        None  15-05-2021  00:56
3   5398  2021-05-15T01:35:00.066Z   A1     B       41     31        None  15-05-2021  01:35
4   5399  2021-05-15T01:50:45.716Z   A2     B       41     27        None  15-05-2021  01:50
5   5400  2021-05-15T02:17:47.996Z   A0     B       39     30        None  15-05-2021  02:17
6   5401  2021-05-15T03:08:07.720Z   A1     B       42     34        None  15-05-2021  03:08
7   5402  2021-05-15T04:09:07.303Z   A0     B       38     28        None  15-05-2021  04:09
8   5403  2021-05-15T08:19:43.513Z   A0     A       41     26       07:14  15-05-2021  08:19
9   5404  2021-05-15T08:20:15.650Z   A2     A       37     27       07:41  15-05-2021  08:20
10  5405  2021-05-15T09:19:45.730Z   A2     A       43     34       08:53  15-05-2021  09:19
11  5406  2021-05-15T09:21:35.190Z   A0     A       42     32       09:14  15-05-2021  09:21
12  5407  2021-05-15T09:52:41.526Z   A2     A       44     31       09:51  15-05-2021  09:52
13  5408  2021-05-15T10:01:10.940Z   A1     A       46     37       10:01  15-05-2021  10:01
14  5409  2021-05-15T10:33:58.743Z   A1     A       44     35       10:33  15-05-2021  10:33
15  5410  2021-05-15T10:52:42.776Z   A2     A       42     28       10:52  15-05-2021  10:52
16  5411  2021-05-15T11:54:48.493Z   A2     A       41     30       11:54  15-05-2021  11:54
17  5412  2021-05-15T12:05:30.396Z   A1     A       44     33       12:05  15-05-2021  12:05
18  5413  2021-05-15T13:26:54.110Z   A2     A       41     28       13:26  15-05-2021  13:26
19  5414  2021-05-15T13:49:43.596Z   A1     A       47     41       13:47  15-05-2021  13:49
20  5415  2021-05-15T15:32:27.373Z   A1     A       45     36       15:30  15-05-2021  15:32
21  5416  2021-05-15T16:27:25.583Z   A0     A       44     34       16:27  15-05-2021  16:27
22  5417  2021-05-15T16:38:24.053Z   A1     A       43     33       16:38  15-05-2021  16:38
23  5418  2021-05-15T16:51:34.783Z   A2     A       36     23       16:49  15-05-2021  16:51
24  5419  2021-05-15T17:40:59.720Z   A1     A       41     33       17:40  15-05-2021  17:40
25  5420  2021-05-15T18:10:40.693Z   A2     A       38     26       18:10  15-05-2021  18:10
26  5421  2021-05-15T19:29:50.450Z   A2     B       35     20        None  15-05-2021  19:29
27  5422  2021-05-15T20:08:44.440Z   A2     B       38     26        None  15-05-2021  20:08
28  5423  2021-05-15T20:11:20.070Z   A0     B       48     33        None  15-05-2021  20:11
29  5424  2021-05-15T20:11:28.746Z   A1     B       39     28        None  15-05-2021  20:11
30  5425  2021-05-15T21:41:33.696Z   A2     B       39     26        None  15-05-2021  21:41
31  5426  2021-05-15T21:55:58.496Z   A1     B       42     34        None  15-05-2021  21:55
32  5427  2021-05-15T22:02:02.870Z   A0     B       48     33        None  15-05-2021  22:02
33  5428  2021-05-15T22:59:38.010Z   A2     B       42     28        None  15-05-2021  22:59

Upvotes: 1

Related Questions