Reputation: 529
How can I get data from this website? It seems a json structure. It possible get it with BeautifulSoup?
import requests
from bs4 import BeautifulSoup
import pandas as pd
r = requests.get(url)
soup = BeautifulSoup(r.text, 'lxml')
Upvotes: 1
Views: 170
Reputation: 195543
You can construct pandas dataframe from the data directly. For example:
import requests
import pandas as pd
url = "https://www.ultimatetennisstatistics.com/statsLeadersTable?current=1&rowCount=-1&sort%5Bvalue%5D=desc&searchPhrase=&category=aces&season=&fromDate=&toDate=&level=&bestOf=&surface=&indoor=&speed=&round=&result=&tournamentId=&opponent=&countryId=&minEntries=&active=true&_=1622884929848"
data = requests.get(url).json()
df = pd.json_normalize(data["rows"])
print(df)
df.to_csv("data.csv", index=False)
Prints:
rank playerId name value country.name country.id country.code
0 1 3333 Ivo Karlovic 13687 Croatia CRO hr
1 2 4544 John Isner 12806 United States USA us
2 3 3819 Roger Federer 11371 Switzerland SUI ch
3 5 3852 Feliciano Lopez 9920 Spain ESP es
4 8 5016 Sam Querrey 8466 United States USA us
5 9 5670 Milos Raonic 8130 Canada CAN ca
6 13 4728 Kevin Anderson 7262 South Africa RSA za
7 14 5220 Marin Cilic 7246 Croatia CRO hr
8 17 4541 Jo Wilfried Tsonga 6634 France FRA fr
9 18 4789 Gael Monfils 6245 France FRA fr
10 20 4920 Novak Djokovic 6069 Serbia SRB rs
11 21 4526 Stan Wawrinka 5900 Switzerland SUI ch
...
And saves data.csv
(screenshot from LibreOffice):
Upvotes: 2