Reputation: 15
So I am trying to scrape different types of information for all players in the premier league, from transfermarkt page.
The relevant code is:
# Create empty list for player link
playerLink1 = []
playerLink2 = []
playerLink3 = []
#For each team link page...
for i in range(len(Full_Links)):
#...Download the team page and process the html code...
squadPage = requests.get(Full_Links[i], headers=headers)
squadTree = squadPage.text
SquadSoup = BeautifulSoup(squadTree,'html.parser')
#...Extract the player links...
playerLocation = SquadSoup.find("div", {"class":"responsive-table"}).find_all("a",{"class":"spielprofil_tooltip"})
for a in playerLocation:
playerLink1.append(a['href'])
[playerLink2.append(x) for x in playerLink1 if x not in playerLink2]
#...For each player link within the team page...
for j in range(len(playerLink2)):
#...Save the link, complete with domain...
temp2 = "https://www.transfermarkt.co.uk" + playerLink2[j]
#...Add the finished link to our teamLinks list...
playerLink3.append(temp2)
#Populate lists with each player
#For each player...
for i in range(len(playerLink3_u)):
#...download and process the two pages collected earlier...
playerPage = requests.get(playerLink3_u[i], headers = headers)
playerTree = playerPage.text
PlayerSoup = BeautifulSoup(playerTree,'html.parser')
#...find the relevant datapoint for each player, starting with name...
tempName = PlayerSoup.find("div", {"class":"spielerdaten "}).find_all("a",{"class":"spielprofil_tooltip"})
The problem is that in the last line, "tempName" (which is wrong), I do not have any class to find the the name of the soccer player.
This is the link for a player https://www.transfermarkt.co.uk/ederson/profil/spieler/238223
Any tips for how I can extract data, from this HTML code, as I need more data from the same place in addition to the name?
Upvotes: 0
Views: 432
Reputation: 1937
I don't know if it is a real solution for your case but maybe you could use the xpath of the element instead of it's class. The Xpath is the path of the HTML code to the very specific element. So, in case the name of the player is in the same position of the HTML script in every page, then you can scrape that element every time
To find the xpath in Firefox you have to find the element in inspector mode, right click it -> copy -> Xpath
Upvotes: 0
Reputation: 28595
The page is dynamic and rendered after the initial request. You'll have to access the data through an api (if available), or use a browser simulation like Selenium to open the page, let it render, and then pull the html:
import pandas as pd
from selenium import webdriver
driver = webdriver.Chrome('C:/chromedriver_win32/chromedriver.exe')
playerPage = driver.get('https://www.transfermarkt.co.uk/ederson/profil/spieler/238223')
df = pd.read_html(driver.page_source)[0]
Output:
print (df.to_string())
0 1
0 Full name: Ederson Santana de Moraes
1 Date of birth: Aug 17, 1993
2 Place of birth: Osasco (SP)
3 Age: 26
4 Height: 1,88 m
5 Citizenship: Brazil Portugal
6 Position: Goalkeeper
7 Foot: left
8 Player agent: Gestifute
9 Current club: Manchester City
10 Joined: Jul 1, 2017
11 Contract expires: 30.06.2025
12 Date of last contract extension: May 13, 2018
13 Outfitter: Nike
14 Social media: NaN
Upvotes: 1