Reputation: 37
I've been trying for the whole day to scrap a piece of text on this website: 'https://bdif.amf-france.org/fr?typesInformation=DD'
I'm using requests and BeautifulSoup but I can't seem to find the correct class/id. My code is below:
source= requests.get('https://bdif.amf-france.org/fr?typesInformation=DD').text
soup = BeautifulSoup(source,'lxml')
article = soup.find('results ng-star-inserted')
print(article)
The text I'm trying to find is the name underneath "Déclaration des dirigeants". I always get a "None" result. Let me know if you know how to do solve this or if you knwo what i'm doing wrong.
Upvotes: 0
Views: 159
Reputation: 28565
The page is dynamic meaning using requests will only return the static html. You can either a) use something like Selnium that allows the page to render, and then you can go in and parse the rendered html, or b) you can get the data directly from the api.
import pandas as pd
import requests
url = 'https://bdif.amf-france.org/back/api/v1/informations?typesInformation=DD'
payload = {
'typesInformation': 'DD',
'from': '0',
'size': '10000',}
jsonData = requests.get(url, params=payload).json()
hits = jsonData['hits']['hits']
df = pd.json_normalize(hits, record_path=['_source','societes'])
Output:
print(df)
role raisonSociale jeton
0 SocieteConcernee ABC ARBITRAGE RS00003494
1 SocieteConcernee ALBIOMA RS00002125
2 SocieteConcernee ALBIOMA RS00002125
3 SocieteConcernee THERMADOR GROUPE RS00002078
4 SocieteConcernee ENVEA SA RS00004271
... ... ...
9995 SocieteConcernee TOTAL S.A. RS00003321
9996 SocieteConcernee SEB S.A. RS00002793
9997 SocieteConcernee SOLOCAL GROUP RS00004089
9998 SocieteConcernee LATECOERE RS00001460
9999 SocieteConcernee EDENRED RS00005100
[10000 rows x 3 columns]
Upvotes: 4