Zakan
Zakan

Reputation: 51

What am I doing wrong in this for loop for Web Scraping with bs4?

I'm trying to loop through a list of players on Transfermarkt, enter each profile, get their profile picture, and then scrape the original list of information. The latter, I've achived (which you will see in my code), but the former I can't seem to get working. I'm not an expert at his, and have received help with my code.

I wan't to save the source link for each players picture, not the image itself, and then store that link into "PlayerImgURL" in my dataframe. (Row 73).

This is my error message:

(.venv) PS C:\Users\cljkn\Desktop\Python scraper github> & "c:/Users/cljkn/Desktop/Python scraper github/.venv/Scripts/python.exe" "c:/Users/cljkn/Desktop/Python scraper github/.vscode/test.py"
  File "c:/Users/cljkn/Desktop/Python scraper github/.vscode/test.py", line 45
    for page in range(1, 21):
    ^
SyntaxError: invalid syntax

Thanks.

from bs4 import BeautifulSoup
import requests
import pandas as pd

playerID = []
playerImage = []
playerName = []
result = []

for page in range(1, 21):

    r = requests.get("https://www.transfermarkt.com/spieler-statistik/wertvollstespieler/marktwertetop?land_id=0&ausrichtung=alle&spielerposition_id=alle&altersklasse=alle&jahrgang=0&kontinent_id=0&plus=1",
        params= {"page": page},
        headers= {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:74.0) Gecko/20100101 Firefox/74.0"}
    )
    soup = BeautifulSoup(r.content, "html.parser")

    links = soup.select('a.spielprofil_tooltip')

    for i in range(len(links)):
        playerID.append(links[i].get('id'))

    for i in range(len(playerID)):
        playerID[i] = 'https://www.transfermarkt.com/kylian-mbappe/profil/spieler/'+playerID[i]
        playerID = list(set(playerID))

    for i in range(len(playerID)):

        r = requests.get(playerID[i],
            params= {"page": page},
            headers= {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:74.0) Gecko/20100101 Firefox/74.0"}
        )
    soup = BeautifulSoup(r.content, "html.parser")

    name = soup.find_all('h1')

    for image in soup.find_all('img'):
        playerName.append('title')

        playerImage.append[image.get('src')




    for page in range(1, 21):

        r = requests.get("https://www.transfermarkt.com/spieler-statistik/wertvollstespieler/marktwertetop?land_id=0&ausrichtung=alle&spielerposition_id=alle&altersklasse=alle&jahrgang=0&kontinent_id=0&plus=1",
            params= {"page": page},
            headers= {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:74.0) Gecko/20100101 Firefox/74.0"}
        )
        soup = BeautifulSoup(r.content, "html.parser")


        tr = soup.find_all("tbody")[1].find_all("tr", recursive=False)

        result.extend([
            { 

            "Club": t[4].find("img")["alt"],
            "Age": t[2].text.strip(),
            "GamesPlayed": t[6].text.strip(),
            "GoalsDone": t[7].text.strip(),
            "OwnGoals": t[8].text.strip(),
            "Assists": t[9].text.strip(),
            "YellowCards": t[10].text.strip(),
            "SecondYellow": t[11].text.strip(),
            "StraightRed": t[12].text.strip(),
            "SubsOn": t[13].text.strip(),
            "SubsOff": t[14].text.strip(),
            "Nationality": t[3].find("img")["alt"], # for all nationality : [ i["alt"] for i in t[3].find_all("img")], 
            "Position": t[1].find_all("td")[2].text,
            "Value": t[5].text.strip(),
            #"PlayerImgURL":
            "ClubImgURL": t[4].find("img")["src"],
            "CountryImgURL": t[3].find("img")["src"] # for all country url: [ i["src"] for i in t[3].find_all("img")]
            }

            for t in (t.find_all(recursive=False) for t in tr)
        ])



df = pd.DataFrame(result,{'Name':playerImage, 'Source':playerImage})


#df.to_csv (r'S:\_ALL\Internal Projects\Introduction_2020\Transfermarkt\PlayerDetails.csv', index = False, header=True)

print(df)

Upvotes: 0

Views: 106

Answers (1)

Hamza Lachi
Hamza Lachi

Reputation: 1064

The issue in this line

playerImage.append[image.get('src')

try to replace with this line

playerImage.append(image.get('src'))

Upvotes: 2

Related Questions