How to access text element in selenium if it is splitted by body tags

Question

I have a problem while trying to access some values on the website during the process of web scraping the data. The problem is that the text I want to extract is in the class which contains several texts separated by tags (these body tags also have texts which are also important for me).

So firstly, I tried to look for the tag with the text I needed ('Category' in this case) and then extract the exact category from the text below this body tag assignment. I could use precise XPath but here it is not the case because other pages I need to web scrape contain a different amount of rows in this sidebar so the locations, as well as XPaths, are different.

The expected output is 'utility' - the category in the sidebar.

The website and the text I need to extract look like that (look right at the sidebar containing 'Category':

The element looks like that:

And the code I tried:

driver = webdriver.Safari()
driver.get('https://www.statsforsharks.com/entry/MC_Squares')
element = driver.find_elements_by_xpath("//b[contains(text(), 'Category')]/following-sibling")
for value in element:
    print(value.text)
driver.close()

the link to the page with the data is https://www.statsforsharks.com/entry/MC_Squares.

Thank you!

Prab G · Accepted Answer

You might be better off using regex here, as the whole text comes under the 'company-sidebar-body' class, where only some text is between b tags and some are not.

So, you can the text of the class first:

sidebartext = driver.find_element_by_class_name("company-sidebar-body").text

That will give you the following:

"EOY Proj Sales: $1,000,000 Sales Prev Year: $200,000 Category: Utility Asking Deal Equity: 10% Amount: $300,000 Value: $3,000,000 Equity Deal Sharks: Kevin O'Leary Equity: 25% Amount: $300,000 Value: $1,200,000 Bite: -$1,800,000"

You can then use regex to target the category:

import re

c = re.search("Category:\s\w+", sidebartext).group()

print(c)

c will result in 'Category: Utility' which you can then work with. This will also work if the value of the category ('Utility') is different on other pages.

How to access text element in selenium if it is splitted by body tags

Answers (2)

Related Questions