MrClean
MrClean

Reputation: 1460

Extract Data From SVG

I have the following code saved to a local html file

<object id="PriceAdvisorFrame" type="image/svg+xml" data="https://www.kbb.com/Api/3.9.448.0/71071/vehicle/upa/PriceAdvisor/meter.svg?action=Get&amp;intent=buy-used&amp;pricetype=Private Party&amp;zipcode=99517&amp;vehicleid=439604&amp;hideMonthlyPayment=True&amp;condition=verygood&amp;mileage=11795" style="width: 100%;"></object>

I am trying to extract the cost from the html when the it is executed a chrome browser. Html code I am trying to parse out is shown below. However this code does not appear when the file is requested using selenium.

<text xmlns="http://www.w3.org/2000/svg" text-anchor="middle" font-size="14" font-weight="700" fill="#333333" y="-8">$27,938</text>
<text xmlns="http://www.w3.org/2000/svg" text-anchor="middle" font-size="14" font-weight="400" fill="#333333" y="-26">Private Party Value</text>
<text xmlns="http://www.w3.org/2000/svg" text-anchor="middle" font-size="20" font-weight="700" fill="#ffffff" y="-48">$26,995 - $28,888</text>
<text xmlns="http://www.w3.org/2000/svg" text-anchor="middle" font-size="14" font-weight="400" fill="#ffffff" y="-68.8">Private Party Range</text>

Here is my code thus far:

options = webdriver.ChromeOptions()
options.add_argument('headless')
user_agent = 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.50 Safari/537.36'    
options.add_argument('user-agent={0}'.format(user_agent))
driver = webdriver.Chrome(chrome_options=options)

driver.get('file:///F:/Onedrive/Python/KBB/test.html')
print(driver.find_element_by_css_selector('text').text)

Any ideas on how to make this work?

Upvotes: 1

Views: 4186

Answers (3)

KunduK
KunduK

Reputation: 33384

To access SVG element you need to use following xpath.

//*[name()='text']

or

//*[local-name()='text']

Try the below code.

elements=WebDriverWait(driver,10).until(EC.visibility_of_all_elements_located((By.XPATH,"//*[name()='text']")))
for ele in elements:
  print(ele.text)

To Execute above code you need to import followings.

from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait

Upvotes: 1

QHarr
QHarr

Reputation: 84465

The html when loaded into browser doesn't have your desired info in the driver.page_source so you cannot select in this way. The browser itself makes a GET request, based on the data attribute, and renders the new content - the file, however, is not updated. You could .get to the data source or use requests.

enter image description here

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait 
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome(r'path\chromedriver.exe')
driver.get(r'C:\Users\User\Desktop\test.html')
print(driver.page_source)
driver.get(driver.find_element_by_css_selector('[data]').get_attribute('data'))
elem = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CSS_SELECTOR , 'text')))
if elem is not None:
    print(elem.text)

Upvotes: 1

frianH
frianH

Reputation: 7563

The 'text' you mean is not css_selector, it is tag_name. You can use .find_elements_* to collect all elements and then to extract the text them.

driver.get('file:///F:/Onedrive/Python/KBB/test.html')

elements = driver.find_elements_by_tag_name('text')
for element in elements:
    text = element.text
    if "$" in text:
        print(text)

Upvotes: 1

Related Questions