Glenn Davies
Glenn Davies

Reputation: 37

How to extract just the number from html?

I am trying to extract on the number from this html element:

<td bgcolor="green">
    <font color="white">
        "49.8 "
        <small>dBmV</small>
    </font>
</td>

How do only extract the 49.8 without getting the bBmV also?

I am able to use the xpath on to return the all of 49.8 dbmv but when searching the xpath of just "49.8" I receive error

Error:

invalid selector: The result of the xpath expression "/html/body/p[1]/table/tbody/tr/td/table[2]/tbody/tr[2]/td[4]/font/text()" is: [object Text]. It should be an element. 

I have tried:

browser.find_element_by_xpath("/html/body/p[1]/table/tbody/tr/td/table[2]/tbody/tr[2]/td[4]/font").text

which returns 49.8 dBmV

And then:

browser.find_element_by_xpath("/html/body/p[1]/table/tbody/tr/td/table[2]/tbody/tr[2]/td[4]/font/text()").text

returns the exception above.

I just want the number 49.8 (which changes obviously). i know i could extract the number later but im hoping there something I can use to just to get the details directly from the html, something a bit tidier

Upvotes: 1

Views: 504

Answers (4)

Moshe Slavin
Moshe Slavin

Reputation: 5204

You can replace the extra text like this:

first_text = browser.find_element_by_xpath("/html/body/p[1]/table/tbody/tr/td/table[2]/tbody/tr[2]/td[4]/font").text
second_text = browser.find_element_by_xpath("/html/body/p[1]/table/tbody/tr/td/table[2]/tbody/tr[2]/td[4]/font/small").text
only_first_text = first_text.replace(second_text, '')

Upvotes: 1

bertilnilsson
bertilnilsson

Reputation: 304

The find_element_by_xpath API in Selenium only supports returning elements, so eventhough it's possible in XPath to specify an expression that would return just the text that you're looking for it won't be possible in this case with XPath only.

Upvotes: 0

undetected Selenium
undetected Selenium

Reputation: 193088

To extract the text 49.8 you can use the following Locator Strategy:

  • Using xpath through execute_script() and textContent:

    print(driver.execute_script('return arguments[0].firstChild.textContent;', driver.find_element_by_xpath("//td[@bgcolor='green']/font[@color='white']")).strip())
    
  • Using xpath through splitlines() and get_attribute():

    print(driver.find_element_by_xpath("//td[@bgcolor='green']/font[@color='white']").get_attribute("innerHTML").splitlines()[1])
    

Upvotes: 2

Israel Pechman
Israel Pechman

Reputation: 11

You can use the first line and just get the number like this:

text_num = browser.find_element_by_xpath("/html/body/p[1]/table/tbody/tr/td/table[2]/tbody/tr[2]/td[4]/font").text
print(float(text_num.split()[0]))

Hope this helped!

Upvotes: 1

Related Questions