Haroon
Haroon

Reputation: 15

Scraping something from a page which does not have a unique class

So I am trying to scrape different types of information for all players in the premier league, from transfermarkt page.

The relevant code is:

# Create empty list for player link
playerLink1 = []
playerLink2 = []
playerLink3 = []
#For each team link page...
for i in range(len(Full_Links)):
    #...Download the team page and process the html code...
    squadPage = requests.get(Full_Links[i], headers=headers)
    squadTree = squadPage.text
    SquadSoup = BeautifulSoup(squadTree,'html.parser')

    #...Extract the player links...

    playerLocation = SquadSoup.find("div", {"class":"responsive-table"}).find_all("a",{"class":"spielprofil_tooltip"})

    for a in playerLocation:
        playerLink1.append(a['href'])
        [playerLink2.append(x) for x in playerLink1 if x not in playerLink2] 

    #...For each player link within the team page...
        for j in range(len(playerLink2)):

    #...Save the link, complete with domain...
            temp2 = "https://www.transfermarkt.co.uk" + playerLink2[j]
    #...Add the finished link to our teamLinks list...
            playerLink3.append(temp2)

#Populate lists with each player

#For each player...
for i in range(len(playerLink3_u)):
    #...download and process the two pages collected earlier...
    playerPage = requests.get(playerLink3_u[i], headers = headers)
    playerTree = playerPage.text
    PlayerSoup = BeautifulSoup(playerTree,'html.parser')

#...find the relevant datapoint for each player, starting with name...
    tempName = PlayerSoup.find("div", {"class":"spielerdaten "}).find_all("a",{"class":"spielprofil_tooltip"})

The problem is that in the last line, "tempName" (which is wrong), I do not have any class to find the the name of the soccer player.

This is the link for a player https://www.transfermarkt.co.uk/ederson/profil/spieler/238223

Any tips for how I can extract data, from this HTML code, as I need more data from the same place in addition to the name?

Upvotes: 0

Views: 432

Answers (2)

Charalamm
Charalamm

Reputation: 1937

I don't know if it is a real solution for your case but maybe you could use the xpath of the element instead of it's class. The Xpath is the path of the HTML code to the very specific element. So, in case the name of the player is in the same position of the HTML script in every page, then you can scrape that element every time

To find the xpath in Firefox you have to find the element in inspector mode, right click it -> copy -> Xpath

Upvotes: 0

chitown88
chitown88

Reputation: 28595

The page is dynamic and rendered after the initial request. You'll have to access the data through an api (if available), or use a browser simulation like Selenium to open the page, let it render, and then pull the html:

import pandas as pd
from selenium import webdriver

driver = webdriver.Chrome('C:/chromedriver_win32/chromedriver.exe')

playerPage = driver.get('https://www.transfermarkt.co.uk/ederson/profil/spieler/238223')
df = pd.read_html(driver.page_source)[0]

Output:

print (df.to_string())
                                   0                          1
0                         Full name:  Ederson Santana de Moraes
1                     Date of birth:               Aug 17, 1993
2                    Place of birth:                Osasco (SP)
3                               Age:                         26
4                            Height:                     1,88 m
5                       Citizenship:            Brazil Portugal
6                          Position:                 Goalkeeper
7                              Foot:                       left
8                      Player agent:                  Gestifute
9                      Current club:            Manchester City
10                           Joined:                Jul 1, 2017
11                 Contract expires:                 30.06.2025
12  Date of last contract extension:               May 13, 2018
13                        Outfitter:                       Nike
14                     Social media:                        NaN

Upvotes: 1

Related Questions