Reputation: 93
I am trying to make a webscraping on this site which has info but I have not succeeded.
I need to create a dataframe
with the columns: 'Fecha', 'Hora', 'Area', 'Turno', '% Solidos', 'Malla 65#', 'Comentarios'
The idea is to scrape from an init_date
to an end_date
defined by me in the python script and to append every day at the end of the dataframe.
Can anyone help me?
Upvotes: 0
Views: 68
Reputation: 28640
Andrej's solution works provided the token doesn't change. You get that token by the login post (which by the way if it's not some generic guest username and pass, you should change...your data can be found in the link you provided). I'm posting here slightly encrypted, but anyone can go to that link and get it, CHANGE IT NOW!). If it is a generic one, then let me know, and I'll just edit this post
import requests
import pandas as pd
#Get the token
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36'}
login = {
'Password': "12****",
'Usuario': "ag******"}
response = requests.post('https://apidch.kairosmining.com/auth/login', headers=headers, data=login).json()
#Use the token to input into the headers
url = 'https://apidch.kairosmining.com/api/CalidadAFlotacion'
headers.update({'authorization': 'Bearer %s' %response['token']})
payload = {
'fecha': "2021-5-12",
'parametro': '1'}
jsonData = requests.post(url, headers=headers, data=payload).json()
df = pd.DataFrame(jsonData)
Upvotes: 1
Reputation: 195593
The data is loaded from external URL using Javascript. You can use this example how to load it to pandas DataFrame:
import requests
import pandas as pd
api_url = "https://apidch.kairosmining.com/api/CalidadAFlotacion"
payload = {"fecha": "2021-5-15", "parametro": 1} # <-- you can change the "fecha" here
headers = {
"Authorization": "Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJ1c3IiOiJhZ2FqYXJkbyIsImVtbCI6ImFuZ2VsLmdhamFyZG9AaG9uZXl3ZWxsLmNvbSIsImlhdCI6MTYyMTI3MDc3MSwiZXhwIjoxNjIyNDgwMzcxfQ.pXy_UiOA6jz3YO4iMRulXbeEVTgSqgjzcJVd0URhp3I"
}
data = requests.post(api_url, headers=headers, json=payload).json()
df = pd.DataFrame(data)
print(df)
Prints:
Id Fecha Area Turno Solidos Malla Comentarios dia hora
0 5395 2021-05-15T00:18:11.256Z A1 B 41 32 None 15-05-2021 00:18
1 5396 2021-05-15T00:46:19.146Z A0 B 44 32 None 15-05-2021 00:46
2 5397 2021-05-15T00:56:39.883Z A2 B 42 29 None 15-05-2021 00:56
3 5398 2021-05-15T01:35:00.066Z A1 B 41 31 None 15-05-2021 01:35
4 5399 2021-05-15T01:50:45.716Z A2 B 41 27 None 15-05-2021 01:50
5 5400 2021-05-15T02:17:47.996Z A0 B 39 30 None 15-05-2021 02:17
6 5401 2021-05-15T03:08:07.720Z A1 B 42 34 None 15-05-2021 03:08
7 5402 2021-05-15T04:09:07.303Z A0 B 38 28 None 15-05-2021 04:09
8 5403 2021-05-15T08:19:43.513Z A0 A 41 26 07:14 15-05-2021 08:19
9 5404 2021-05-15T08:20:15.650Z A2 A 37 27 07:41 15-05-2021 08:20
10 5405 2021-05-15T09:19:45.730Z A2 A 43 34 08:53 15-05-2021 09:19
11 5406 2021-05-15T09:21:35.190Z A0 A 42 32 09:14 15-05-2021 09:21
12 5407 2021-05-15T09:52:41.526Z A2 A 44 31 09:51 15-05-2021 09:52
13 5408 2021-05-15T10:01:10.940Z A1 A 46 37 10:01 15-05-2021 10:01
14 5409 2021-05-15T10:33:58.743Z A1 A 44 35 10:33 15-05-2021 10:33
15 5410 2021-05-15T10:52:42.776Z A2 A 42 28 10:52 15-05-2021 10:52
16 5411 2021-05-15T11:54:48.493Z A2 A 41 30 11:54 15-05-2021 11:54
17 5412 2021-05-15T12:05:30.396Z A1 A 44 33 12:05 15-05-2021 12:05
18 5413 2021-05-15T13:26:54.110Z A2 A 41 28 13:26 15-05-2021 13:26
19 5414 2021-05-15T13:49:43.596Z A1 A 47 41 13:47 15-05-2021 13:49
20 5415 2021-05-15T15:32:27.373Z A1 A 45 36 15:30 15-05-2021 15:32
21 5416 2021-05-15T16:27:25.583Z A0 A 44 34 16:27 15-05-2021 16:27
22 5417 2021-05-15T16:38:24.053Z A1 A 43 33 16:38 15-05-2021 16:38
23 5418 2021-05-15T16:51:34.783Z A2 A 36 23 16:49 15-05-2021 16:51
24 5419 2021-05-15T17:40:59.720Z A1 A 41 33 17:40 15-05-2021 17:40
25 5420 2021-05-15T18:10:40.693Z A2 A 38 26 18:10 15-05-2021 18:10
26 5421 2021-05-15T19:29:50.450Z A2 B 35 20 None 15-05-2021 19:29
27 5422 2021-05-15T20:08:44.440Z A2 B 38 26 None 15-05-2021 20:08
28 5423 2021-05-15T20:11:20.070Z A0 B 48 33 None 15-05-2021 20:11
29 5424 2021-05-15T20:11:28.746Z A1 B 39 28 None 15-05-2021 20:11
30 5425 2021-05-15T21:41:33.696Z A2 B 39 26 None 15-05-2021 21:41
31 5426 2021-05-15T21:55:58.496Z A1 B 42 34 None 15-05-2021 21:55
32 5427 2021-05-15T22:02:02.870Z A0 B 48 33 None 15-05-2021 22:02
33 5428 2021-05-15T22:59:38.010Z A2 B 42 28 None 15-05-2021 22:59
Upvotes: 1