BOB123
BOB123

Reputation: 176

difference between xpath query in python and in the web

I'm trying to iterate over the actual team players of a certain football team. I have noticed that in Wikipedia the players that belongs to the team have the same format. There are 4-6 tables at this format, 2 for actually first squad team players then the rest are for players on loan, or young players etc... when using online tools querying the Wikipedia page using XPath queries i get the result i want to, but when I'm using it with Python lxml.html library and requests library instead of seeing the tables of players as 4-6 tables it sees it as one table element which makes it headache to extract only the first team players.

here is my python code :

def create_team_ontology(ontology_graph,team_url,team_name):
     res = requests.get(team_url)
     doc = lxml.html.fromstring(res.content)
     print(team_url)
     club_players = doc.xpath("//table[3]/tbody//tr[position() > 1]//td[4]//span/a/@href")
     for player_suffix_url in club_players:
        print(player_suffix_url+'\n')
        player_url = wiki_prefix + player_suffix_url
        get_player_info(ontology_graph,player_url,team_name)

and here is an example of a wiki page of Arsenal https://en.wikipedia.org/wiki/Arsenal_F.C. in the source file it's easy to check that each table is a different element. but my club players list contains all the players href under the Players category in the above page.

and this is the code i run on the web,using inspect then ctrl+f //table[3]/tbody//tr[position() > 1]//td[4]//span/a/@href

Upvotes: 0

Views: 53

Answers (1)

E.Wiest
E.Wiest

Reputation: 5915

Your code almost works. If I use the XPath I've posted in the other topic, with :

from lxml import html
import requests
res = requests.get('https://en.wikipedia.org/wiki/Arsenal_F.C.')
doc = html.fromstring(res.content)
club_players = doc.xpath('//span[@id="Players"]/following::table[1]//span[@class="fn"]//@href')
for player_suffix_url in club_players:
        print(player_suffix_url+'\n')

you get the 27 players urls of Arsenal first team.

/wiki/Bernd_Leno

/wiki/H%C3%A9ctor_Beller%C3%ADn

/wiki/Kieran_Tierney

/wiki/Sokratis_Papastathopoulos

/wiki/Dani_Ceballos

...

Upvotes: 1

Related Questions