Reputation: 39
I don't know why I am not able to scrape specific spans classes.
Example of class that I want to scrape:
<span class="player-matches__tournament-location">MELBOURNE, AUSTRALIA</span>
code that I used:
import requests
from bs4 import BeautifulSoup
url = "https://www.wtatennis.com/players/326408/iga-swiatek/#matches"
page = requests.get(url)
soup = BeautifulSoup(page.content, 'html.parser')
spans = soup.find_all('span', attrs={'class' : "player-matches__tournament-location"})
the output of the above code is an empty list. what I should change or is the website blocked to scrape?
Upvotes: 0
Views: 72
Reputation: 195448
The page loads the data from external URL via JavaScript, so beautifulsoup
doesn't see it. You can use requests
module to simulate these calls. For example:
import requests
import pandas as pd
# 326408 is the number from your URL in the question
url = 'https://api.wtatennis.com/tennis/players/326408/matches/?page=0&pageSize=50&id=326408&year=&type=S&sort=desc&tournamentGroupId='
data = requests.get(url).json()
df = pd.DataFrame(data['matches'])
df = pd.concat([df, df.pop('opponent').apply(pd.Series)], axis=1)
df = pd.concat([df, df.pop('tournament').apply(pd.Series)], axis=1)
print(df.head().to_markdown(index=False))
Prints:
Country | DrawSizes | PrizeMoney | PrizeWon | StartDate | Surface | TournamentLevel | TournamentName | TournamentType | city | entry_rank_1 | entry_rank_2 | entry_type_1 | entry_type_2 | opponent_partner | partner | player_1 | player_2 | player_3 | player_4 | points_1 | points_2 | points_bonus_1 | points_bonus_2 | points_champ_1 | points_champ_2 | qpm_flag | rank_1 | rank_2 | rank_code_1 | rank_code_2 | reason_code | round_name | s_d_flag | scores | seed_1 | seed_2 | spc_rank_1 | spc_rank_2 | team_name_1 | team_name_2 | tourn_nbr | tourn_round | tourn_year | winner | id | firstName | lastName | fullName | countryCode | dateOfBirth | metadata | tournamentGroup | year | title | startDate | endDate | surface | inOutdoor | city | country | singlesDrawSize | doublesDrawSize | prizeMoney | prizeMoneyCurrency | liveScoringId |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
AUSTRALIA | nan | 236578 | 2023-01-16T00:00:00+00:00 | HARD | GS | AUSTRALIAN OPEN | GS | 326408 | 324166 | 240 | nan | 240 | nan | M | 1 | 25 | W | R16 | S | 6-4 6-4 | 1 | 22 | nan | SWIATEK, IGA POL | RYBAKINA, ELENA KAZ | 901 | 4 | 2023 | 2 | 324166 | Elena | Rybakina | Elena Rybakina | KAZ | 1999-06-17 | nan | {'id': 901, 'name': 'AUSTRALIAN OPEN', 'level': 'Grand Slam', 'metadata': None} | 2023 | Australian Open - Melbourne, AUS | 2023-01-16 | 2023-01-29 | Hard | O | MELBOURNE | AUSTRALIA | 128 | 64 | 0 | USD | 901 | |||||||||||||||
AUSTRALIA | nan | 236578 | 2023-01-16T00:00:00+00:00 | HARD | GS | AUSTRALIAN OPEN | GS | Q | 326408 | 321158 | 240 | 70 | 240 | 70 | M | 1 | 100 | W | R32 | S | 6-0 6-1 | 1 | nan | nan | SWIATEK, IGA POL | BUCSA, CRISTINA ESP | 901 | 3 | 2023 | 1 | 321158 | Cristina | Bucsa | Cristina Bucsa | ESP | 1998-01-01 | nan | {'id': 901, 'name': 'AUSTRALIAN OPEN', 'level': 'Grand Slam', 'metadata': None} | 2023 | Australian Open - Melbourne, AUS | 2023-01-16 | 2023-01-29 | Hard | O | MELBOURNE | AUSTRALIA | 128 | 64 | 0 | USD | 901 | ||||||||||||||
AUSTRALIA | nan | 236578 | 2023-01-16T00:00:00+00:00 | HARD | GS | AUSTRALIAN OPEN | GS | 326408 | 325898 | 240 | 10 | 240 | 10 | M | 1 | 84 | W | R64 | S | 6-2 6-3 | 1 | nan | nan | SWIATEK, IGA POL | OSORIO, CAMILA COL | 901 | 2 | 2023 | 1 | 325898 | Camila | Osorio | Camila Osorio | COL | 2001-12-22 | nan | {'id': 901, 'name': 'AUSTRALIAN OPEN', 'level': 'Grand Slam', 'metadata': None} | 2023 | Australian Open - Melbourne, AUS | 2023-01-16 | 2023-01-29 | Hard | O | MELBOURNE | AUSTRALIA | 128 | 64 | 0 | USD | 901 | |||||||||||||||
AUSTRALIA | nan | 236578 | 2023-01-16T00:00:00+00:00 | HARD | GS | AUSTRALIAN OPEN | GS | 326408 | 325940 | 240 | 0 | 240 | 0 | M | 1 | 69 | W | R128 | S | 6-4 7-5 | 1 | nan | nan | SWIATEK, IGA POL | NIEMEIER, JULE GER | 901 | 1 | 2023 | 1 | 325940 | Jule | Niemeier | Jule Niemeier | GER | 1999-08-12 | nan | {'id': 901, 'name': 'AUSTRALIAN OPEN', 'level': 'Grand Slam', 'metadata': None} | 2023 | Australian Open - Melbourne, AUS | 2023-01-16 | 2023-01-29 | Hard | O | MELBOURNE | AUSTRALIA | 128 | 64 | 0 | USD | 901 | |||||||||||||||
AUSTRALIA | 0M/0Q/0D | 7.5e+06 | 384375 | 2022-12-29T00:00:00+00:00 | HARD | P | UNITED CUP | VS | 326408 | 316956 | 125 | 0 | 125 | 0 | M | 1 | 3 | W | SF | S | 6-2 6-2 | nan | nan | nan | PEGULA, JESSICA USA | SWIATEK, IGA POL | 2084 | 3 | 2023 | 2 | 316956 | Jessica | Pegula | Jessica Pegula | USA | 1994-02-24 | nan | {'id': 2084, 'name': 'UNITED CUP', 'level': 'WTA 500', 'metadata': None} | 2023 | United Cup - Australia, AUS | 2022-12-29 | 2023-01-08 | Hard | O | AUSTRALIA | 0 | 0 | 7500000 | USD | 2084 |
Upvotes: 2