Reputation: 51
I'm trying to loop through a list of players on Transfermarkt, enter each profile, get their profile picture, and then scrape the original list of information. The latter, I've achived (which you will see in my code), but the former I can't seem to get working. I'm not an expert at his, and have received help with my code.
I wan't to save the source link for each players picture, not the image itself, and then store that link into "PlayerImgURL" in my dataframe. (Row 73).
This is my error message:
(.venv) PS C:\Users\cljkn\Desktop\Python scraper github> & "c:/Users/cljkn/Desktop/Python scraper github/.venv/Scripts/python.exe" "c:/Users/cljkn/Desktop/Python scraper github/.vscode/test.py"
File "c:/Users/cljkn/Desktop/Python scraper github/.vscode/test.py", line 45
for page in range(1, 21):
^
SyntaxError: invalid syntax
Thanks.
from bs4 import BeautifulSoup
import requests
import pandas as pd
playerID = []
playerImage = []
playerName = []
result = []
for page in range(1, 21):
r = requests.get("https://www.transfermarkt.com/spieler-statistik/wertvollstespieler/marktwertetop?land_id=0&ausrichtung=alle&spielerposition_id=alle&altersklasse=alle&jahrgang=0&kontinent_id=0&plus=1",
params= {"page": page},
headers= {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:74.0) Gecko/20100101 Firefox/74.0"}
)
soup = BeautifulSoup(r.content, "html.parser")
links = soup.select('a.spielprofil_tooltip')
for i in range(len(links)):
playerID.append(links[i].get('id'))
for i in range(len(playerID)):
playerID[i] = 'https://www.transfermarkt.com/kylian-mbappe/profil/spieler/'+playerID[i]
playerID = list(set(playerID))
for i in range(len(playerID)):
r = requests.get(playerID[i],
params= {"page": page},
headers= {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:74.0) Gecko/20100101 Firefox/74.0"}
)
soup = BeautifulSoup(r.content, "html.parser")
name = soup.find_all('h1')
for image in soup.find_all('img'):
playerName.append('title')
playerImage.append[image.get('src')
for page in range(1, 21):
r = requests.get("https://www.transfermarkt.com/spieler-statistik/wertvollstespieler/marktwertetop?land_id=0&ausrichtung=alle&spielerposition_id=alle&altersklasse=alle&jahrgang=0&kontinent_id=0&plus=1",
params= {"page": page},
headers= {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:74.0) Gecko/20100101 Firefox/74.0"}
)
soup = BeautifulSoup(r.content, "html.parser")
tr = soup.find_all("tbody")[1].find_all("tr", recursive=False)
result.extend([
{
"Club": t[4].find("img")["alt"],
"Age": t[2].text.strip(),
"GamesPlayed": t[6].text.strip(),
"GoalsDone": t[7].text.strip(),
"OwnGoals": t[8].text.strip(),
"Assists": t[9].text.strip(),
"YellowCards": t[10].text.strip(),
"SecondYellow": t[11].text.strip(),
"StraightRed": t[12].text.strip(),
"SubsOn": t[13].text.strip(),
"SubsOff": t[14].text.strip(),
"Nationality": t[3].find("img")["alt"], # for all nationality : [ i["alt"] for i in t[3].find_all("img")],
"Position": t[1].find_all("td")[2].text,
"Value": t[5].text.strip(),
#"PlayerImgURL":
"ClubImgURL": t[4].find("img")["src"],
"CountryImgURL": t[3].find("img")["src"] # for all country url: [ i["src"] for i in t[3].find_all("img")]
}
for t in (t.find_all(recursive=False) for t in tr)
])
df = pd.DataFrame(result,{'Name':playerImage, 'Source':playerImage})
#df.to_csv (r'S:\_ALL\Internal Projects\Introduction_2020\Transfermarkt\PlayerDetails.csv', index = False, header=True)
print(df)
Upvotes: 0
Views: 106
Reputation: 1064
The issue in this line
playerImage.append[image.get('src')
try to replace with this line
playerImage.append(image.get('src'))
Upvotes: 2