Reputation: 797
Trying to create a dict that holds name,position and number for each player for each team. But when trying to create the final dictionary players[team_name] =dict(zip(number,name,position))
it throws an error (see below). I can't seem to get it right, any thoughts on what I'm doing wrong here would be highly appreciated. Many thanks,
from bs4 import BeautifulSoup as soup
import requests
from lxml import html
clubs_url = 'https://www.premierleague.com/clubs'
parent_url = clubs_url.rsplit('/', 1)[0]
data = requests.get(clubs_url).text
html = soup(data, 'html.parser')
team_name = []
team_link = []
for ul in html.find_all('ul', {'class': 'block-list-5 block-list-3-m block-list-1-s block-list-1-xs block-list-padding dataContainer'}):
for a in ul.find_all('a'):
team_name.append(str(a.h4).split('>', 1)[1].split('<')[0])
team_link.append(parent_url+a['href'])
team_link = [item.replace('overview', 'squad') for item in team_link]
team = dict(zip(team_name, team_link))
data = {}
players = {}
for team_name, team_link in team.items():
player_page = requests.get(team_link)
cont = soup(player_page.content, 'lxml')
clud_ele = cont.find_all('span', attrs={'class' : 'playerCardInfo'})
for i in clud_ele:
v_number = [100 if v == "-" else v.get_text(strip=True) for v in i.select('span.number')]
v_name = [v.get_text(strip=True) for v in i.select('h4.name')]
v_position = [v.get_text(strip=True) for v in i.select('span.position')]
key_number = [key for element in i.select('span.number') for key in element['class']]
key_name = [key for element in i.select('h4.name') for key in element['class']]
key_position = [key for element in i.select('span.position') for key in element['class']]
number = dict(zip(key_number,v_number))
name = dict(zip(key_name,v_name))
position = dict(zip(key_position,v_name))
players[team_name] = dict(zip(number,name,position))
---> 21 players[team_name] = dict(zip(number,name,position))
22
23
ValueError: dictionary update sequence element #0 has length 3; 2 is required
Upvotes: 0
Views: 93
Reputation: 6209
There are many problems in your code. The one causing the error is that you are trying to instantiate a dictionary with a 3-items tuple in list which is not possible. See the dict doc for details.
That said, I would suggest to rework the whole nested loop.
First, you have in clud_ele
a list of player info, each player info concerns only one player and provides only one position, only one name and only one number. So there is no need to store those informations in lists, you could use simple variables:
for player_info in clud_ele:
number = player_info.select('span.number')[0].get_text(strip=True)
if number == '-':
number = 100
name = player_info.select('h4.name')[0].get_text(strip=True)
position = player_info.select('span.position')[0].get_text(strip=True)
Here, usage of select
method returns a list but since you know that the list contains only one item, it's ok to get this item to call get_text
on. But you could check that the player_info.select('span.number')
length is actually 1 before continuing to work if you want to be sure...
This way, you get scalar data type which will be much easier to manipulate.
Also note that I renamed the i
to player_info
which is much more explicit.
Then you can easily add your player data to your players
dict:
players[team_name].append({'name': name,
'position': position
'number': number})
This assume that you create the players[team_name]
before the nested loop with players[team_name] = []
.
Edit: as stated in the @kederrac's answer, usage of a defaultdict
is a smart and convenient way to avoid the manual creation of each players[team_name]
list
Finally, this will give you:
name
, position
and number
keys for each playerteam_name
It is the data structure you seems to want, but other structures are possible. Remember to think about your data structure to make it logical AND easily manipulable.
Upvotes: 1
Reputation: 17322
you can't instantiate a dict
with 3 arguments, the problem is the fact that you have 3 variables in the zip
: zip(number, name, position)
with which you want to instantiate a dict
, you should give only 2 arguments at a time, the key and the value
I've rewritten your las part of the code:
from collections import defaultdict
data = {}
players = defaultdict(list)
for team_name, team_link in team.items():
player_page = requests.get(team_link)
cont = soup(player_page.text, 'lxml')
clud_ele = cont.find_all('span', attrs={'class' : 'playerCardInfo'})
for i in clud_ele:
num = i.select('span.number')[0].get_text(strip=True)
number = 100 if num == '-' else num
name = i.select('h4.name')[0].get_text(strip=True)
position = i.select('span.position')[0].get_text(strip=True)
players[team_name].append({'number': number, 'position': position, 'name': name})
output:
defaultdict(list,
{'Arsenal': [{'number': '1',
'position': 'Goalkeeper',
'name': 'Bernd Leno'},
{'number': '26',
'position': 'Goalkeeper',
'name': 'Emiliano Martínez'},
{'number': '33', 'position': 'Goalkeeper', 'name': 'Matt Macey'},
{'number': '2',
'position': 'Defender',
'name': 'Héctor Bellerín'},
.......................
Upvotes: 1