Reputation: 176
I'm trying to iterate over the actual team players of a certain football team. I have noticed that in Wikipedia the players that belongs to the team have the same format. There are 4-6 tables at this format, 2 for actually first squad team players then the rest are for players on loan, or young players etc... when using online tools querying the Wikipedia page using XPath queries i get the result i want to, but when I'm using it with Python lxml.html library and requests library instead of seeing the tables of players as 4-6 tables it sees it as one table element which makes it headache to extract only the first team players.
here is my python code :
def create_team_ontology(ontology_graph,team_url,team_name):
res = requests.get(team_url)
doc = lxml.html.fromstring(res.content)
print(team_url)
club_players = doc.xpath("//table[3]/tbody//tr[position() > 1]//td[4]//span/a/@href")
for player_suffix_url in club_players:
print(player_suffix_url+'\n')
player_url = wiki_prefix + player_suffix_url
get_player_info(ontology_graph,player_url,team_name)
and here is an example of a wiki page of Arsenal https://en.wikipedia.org/wiki/Arsenal_F.C. in the source file it's easy to check that each table is a different element. but my club players list contains all the players href under the Players category in the above page.
and this is the code i run on the web,using inspect then ctrl+f //table[3]/tbody//tr[position() > 1]//td[4]//span/a/@href
Upvotes: 0
Views: 53
Reputation: 5915
Your code almost works. If I use the XPath I've posted in the other topic, with :
from lxml import html
import requests
res = requests.get('https://en.wikipedia.org/wiki/Arsenal_F.C.')
doc = html.fromstring(res.content)
club_players = doc.xpath('//span[@id="Players"]/following::table[1]//span[@class="fn"]//@href')
for player_suffix_url in club_players:
print(player_suffix_url+'\n')
you get the 27 players urls of Arsenal first team.
/wiki/Bernd_Leno
/wiki/H%C3%A9ctor_Beller%C3%ADn
/wiki/Kieran_Tierney
/wiki/Sokratis_Papastathopoulos
/wiki/Dani_Ceballos
...
Upvotes: 1