Reputation: 185
I would like to find the correct XPath
for my scraper.
What I'm trying to do: Scrape the market value of a player.
Problem: Market value only shows in HTML
when moving mouse over the path or the club images.. I don't know exactly.
Code:
from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
import time
url = 'https://www.transfermarkt.de/manuel-neuer/marktwertverlauf/spieler/17259'
driver = webdriver.Chrome()
driver.implicitly_wait(30)
driver.get(url)
time.sleep(5)
actions = ActionChains(driver)
actions.move_to_element_by_xpath('//*[@id="highcharts-0"]/div/span')
actions.move_to_element_by_xpath('//*[@id="highcharts-0"]/svg/g[5]/g[1]/path[1]')
actions.move_to_element_by_xpath('//*[@id="highcharts-0"]/svg/g[5]/g[2]/image[33]')
actions.perform()
date = driver.find_element_by_xpath('//*[@id="highcharts-0"]/div/span/b[1]').text
value = driver.find_element_by_xpath('//*[@id="highcharts-0"]/div/span/b[2]').text
club = driver.find_element_by_xpath('//*[@id="highcharts-0"]/div/span/b[3]').text
age = driver.find_element_by_xpath('//*[@id="highcharts-0"]/div/span/b[4]').text
print(date, value, club, age)
Alright, so if I run this code, it returns an error, as the date, value, club, and age only show up when hovering over the path I guess.
If I manually move the mouse over the club images in the svg
, it returns the right data.
So, how do I find the correct xpath
for the move_to_element_by_xpath
here?
I've tried so many combinations.
Upvotes: 1
Views: 251
Reputation: 84465
This is not a clean solution as I am treating a javascript object as if it can be converted to valid JSON. I extract from a script tag where the values are generated. There are some encoding issues to overcome which @poke helped with.
import requests
from bs4 import BeautifulSoup as bs
import json
url = 'https://www.transfermarkt.de/manuel-neuer/marktwertverlauf/spieler/17259'
headers = {'Host' : 'www.transfermarkt.de',
'Referer' : 'https://www.transfermarkt.de/manuel-neuer/marktwertverlauf/spieler/17259',
'User-Agent' : 'Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36'}
res = requests.get(url, headers = headers)
soup = bs(res.content,'lxml')
scripts = soup.select('script[type="text/javascript"]')
script = [script.text for script in scripts if 'CDATA' in script.text]
if len(script) > 0:
s = script[1].split("'series':")[1].split(",'credits'")[0].replace("'",'"')
data = json.loads(s.replace('\\x', '\\u00'))
for item in data[0]['data']:
print('Team: ' + item['verein'])
print('Age: ' + str(item['age']))
print('Date: ' + str(item['datum_mw']))
print('Value' + str(item['y']))
As @poke explained to me:
"The code uses \xAB as escape sequences where AB is a hexadecimal number that references a character. The other valid escape sequence is \uABCD with ABCD as a hexadecimal number. In general, \xAB is equivalent to \u00AB since that’s how Unicode code points are made. So you can convert from one to the other. And since \uABCD are valid escape sequences within JSON, you can parse that."
Upvotes: 2
Reputation: 892
So, what I can gather is the tooltip
is getting data from https://www.transfermarkt.de/fc-bayern-munchen/startseite/verein/27
here so scrape data from this link .
As, the data is available without any tooltips
and you can easily find their xpath
on that webpage.
Upvotes: 0