Mahesh Gupta
Mahesh Gupta

Reputation: 165

How to extract text from svg using python selenium

I'm trying to scrape the price from link: https://www.kbb.com/cadillac/deville/1996/sedan-4d/ img showing price range

The prices are shown in text tag inside svg tag.

When i use the xpath: .//*[name()='svg']//*[name()='g']//*[name()='text'] inside the browser's inspect element, I'm able to find the tags. But the same xpath is not working in the code.

The current code is:

def get_price(url):
    driver.get(url)
    time.sleep(10)
    try:
        price_xpaths = driver.find_elements_by_xpath(".//*[name()='svg']//*[name()='g']//*[name()='text']")
    except:
        print("price not found")

    for p in price_tags:
        print(p.text)

I get a blank list in return of function find_elements_by_xpath when I run the above code. I tried other things as well like switching to default content because the element is in #document

driver.switch_to_default_content()

but this also didn't work out well. If there is any other way to scrape price, please let me know.

Upvotes: 1

Views: 2995

Answers (1)

furas
furas

Reputation: 142651

It is external SVG and it seems Selenium doesn't have it in DOM so I had to get <object> which has url to this SVG file, get this url in data, download it using requests and get text using BeautifulSoup

from selenium import webdriver
import time
import requests
from bs4 import BeautifulSoup

url = 'https://www.kbb.com/cadillac/deville/1996/sedan-4d/'

driver = webdriver.Firefox()
driver.get(url)
time.sleep(5)

# doesn't work - always empty list
#price_xpaths = driver.find_elements_by_xpath(".//*[name()='svg']//*[name()='g']//*[name()='text']")
#price_xpaths = driver.find_elements_by_xpath('//svg')
#price_xpaths = driver.find_elements_by_xpath('//svg//g//text')
#price_xpaths = driver.find_elements_by_xpath('//*[@id="PriceAdvisor"]')
#print(price_xpaths)  # always empty list

# single element `object`
svg_item = driver.find_element_by_xpath('//object[@id="PriceAdvisorFrame"]')

# doesn't work - always empty string
#print(svg_item.get_attribute('innerHTML'))

# get url to file SVG
svg_url = svg_item.get_attribute('data')
print(svg_url)  

# download it and parse
r = requests.get(svg_url)
soup = BeautifulSoup(r.content, 'html.parser')

text_items = soup.find_all('text')
for item  in text_items:
    print(item.text)

Result:

Fair Market Range
$1,391 - $2,950
Fair Purchase Price
$2,171
Typical
Listing Price
$2,476

enter image description here


BTW: Information for other users: I had to use proxy/VPN with IP located in US to see this page. For location PL it displays

Access Denied. 
You don't have permission to access "http://www.kbb.com/cadillac/deville/1996/sedan-4d/" on this server.

Sometimes even for location in US it gives me this message.

Upvotes: 3

Related Questions