isoparme
isoparme

Reputation: 11

BeautifulSoup html parser don't send to me the entire code i see in dev tool , why?

for this page but I can only receive very few tags, is this site dynamic in which case I should probably run a script to see the data? then I would like to extract the values ​​from the chart, this site displays the water level of my city, I tried this but it returns me nothing, so to speak, but in the dev tools of chrome I see everything, why? thanks in advance for your help!

here : the site whith epoch.. http://aqualim.environnement.wallonie.be/Station.do?method=selectStation&time=1642669254241&station=L7880

here : the code i try and response

URL = "http://aqualim.environnement.wallonie.be/Station.do?method=selectStation&time=1642669254241&station=L7880" page = requests.get(URL) soup = BeautifulSoup(page.content, "html.parser") print(soup)

Upvotes: 0

Views: 63

Answers (2)

chitown88
chitown88

Reputation: 28565

As stated, the request returns the static html. This data is loaded dynamically.

You could use something like puppeteer or Selenium to allow the page to render first, then you can pull and parse the html. Or, you can get the data directly in a nice json format here.

I'm not sure what data you want exactly.

import pandas as pd
import requests

url = 'http://geoservices2.wallonie.be/arcgis/rest/services/APP_AQUALIM/STATION_PUBLIC/MapServer/0/query'
payload = {
'f': 'json',
'where': '1=1',
'returnGeometry': 'true',
'spatialRel': 'esriSpatialRelIntersects',
'outFields': '*',
'outSR': '31370'}

jsonData = requests.get(url, params=payload).json()

df = pd.json_normalize(jsonData['features'])

Output:

print(df)
    attributes.NOMSTA attributes.LOCALITE  ... geometry.x  geometry.y
0               L5021           Resteigne  ...   207730.0     86925.0
1               L5060           Romedenne  ...   174509.0     94935.0
2               L5170            Baisieux  ...   101819.0    119540.0
3               L5183                Onoz  ...   171179.0    130329.0
4               L5201             Rhisnes  ...   183172.0    130635.0
..                ...                 ...  ...        ...         ...
175             L8640              Anthée  ...   176992.0    103682.0
176             L8650               Gozin  ...   194984.0     90490.0
177             T0025    Faulx les Tombes  ...   195622.0    125555.0
178             T0054           Pépinster  ...   251109.0    140666.0
179             T0055  Trooz (temporaire)  ...   243552.0    140836.0

[180 rows x 8 columns]

To filter:

df_7880 = df[df['attributes.NOMSTA']=='L7880']

Output:

print(df_7880.to_string())
    attributes.NOMSTA attributes.LOCALITE attributes.RIVIERE  attributes.X_LAMBERT  attributes.Y_LAMBERT  attributes.ESRI_OID  geometry.x  geometry.y
152             L7880                 Ere    Rieu des Barges               79114.0              141573.0                  153     79114.0    141573.0

Upvotes: 0

jennie
jennie

Reputation: 55

The item that you want to scrape is javascript render. Module request only receives static html. You can use puppeteer to scrap everything that you see in developer.

Upvotes: 0

Related Questions