Finding XPath in a path inside svg

Question

I would like to find the correct XPath for my scraper.

What I'm trying to do: Scrape the market value of a player.

Problem: Market value only shows in HTML when moving mouse over the path or the club images.. I don't know exactly.

Code:

from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
import time

url = 'https://www.transfermarkt.de/manuel-neuer/marktwertverlauf/spieler/17259'

driver = webdriver.Chrome()
driver.implicitly_wait(30)
driver.get(url)
time.sleep(5)

actions = ActionChains(driver)
actions.move_to_element_by_xpath('//*[@id="highcharts-0"]/div/span')
actions.move_to_element_by_xpath('//*[@id="highcharts-0"]/svg/g[5]/g[1]/path[1]')
actions.move_to_element_by_xpath('//*[@id="highcharts-0"]/svg/g[5]/g[2]/image[33]')
actions.perform()

date = driver.find_element_by_xpath('//*[@id="highcharts-0"]/div/span/b[1]').text
value = driver.find_element_by_xpath('//*[@id="highcharts-0"]/div/span/b[2]').text
club = driver.find_element_by_xpath('//*[@id="highcharts-0"]/div/span/b[3]').text
age = driver.find_element_by_xpath('//*[@id="highcharts-0"]/div/span/b[4]').text

print(date, value, club, age)

Alright, so if I run this code, it returns an error, as the date, value, club, and age only show up when hovering over the path I guess.

If I manually move the mouse over the club images in the svg, it returns the right data.

So, how do I find the correct xpath for the move_to_element_by_xpath here?

I've tried so many combinations.

QHarr · Accepted Answer

This is not a clean solution as I am treating a javascript object as if it can be converted to valid JSON. I extract from a script tag where the values are generated. There are some encoding issues to overcome which @poke helped with.

import requests
from bs4 import BeautifulSoup as bs
import json

url = 'https://www.transfermarkt.de/manuel-neuer/marktwertverlauf/spieler/17259'
headers = {'Host' : 'www.transfermarkt.de',
'Referer' : 'https://www.transfermarkt.de/manuel-neuer/marktwertverlauf/spieler/17259',
'User-Agent' : 'Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36'}
res = requests.get(url, headers = headers)
soup = bs(res.content,'lxml')
scripts = soup.select('script[type="text/javascript"]')
script = [script.text for script in scripts if 'CDATA' in script.text]


if len(script) > 0:
    s = script[1].split("'series':")[1].split(",'credits'")[0].replace("'",'"')
    data = json.loads(s.replace('\x', '\u00'))
    for item in data[0]['data']:
        print('Team: ' + item['verein'])
        print('Age: ' + str(item['age']))
        print('Date: ' + str(item['datum_mw']))
        print('Value' + str(item['y']))

As @poke explained to me:

"The code uses \xAB as escape sequences where AB is a hexadecimal number that references a character. The other valid escape sequence is \uABCD with ABCD as a hexadecimal number. In general, \xAB is equivalent to \u00AB since that’s how Unicode code points are made. So you can convert from one to the other. And since \uABCD are valid escape sequences within JSON, you can parse that."

Finding XPath in a path inside svg

Answers (2)

Related Questions